Skip to content

Conversation

@mapo80
Copy link
Owner

@mapo80 mapo80 commented Aug 13, 2025

Summary

  • document conversion performance for Docling PDF, TIFF and PNG samples
  • compare MarkItDownNet timings against markitdown benchmarks

Testing

  • ~/.dotnet/dotnet build
  • ~/.dotnet/dotnet test

https://chatgpt.com/codex/tasks/task_e_689c22f8b69c83259ee3140bbd1c08f8

Copilot AI review requested due to automatic review settings August 13, 2025 05:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the Docling dataset conversion timing benchmarks to include TIFF and PNG image samples, providing a more comprehensive performance comparison with markitdown. The updates reflect improved performance results and expand the test coverage beyond PDF files.

Key Changes:

  • Added timing data for TIFF and PNG image file processing
  • Updated PDF timing results with improved performance metrics
  • Enhanced the comparison section to include image format timings against markitdown benchmarks

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

## Docling data conversion timings

The following timings were captured while converting the PDF samples from Docling's `tests/data` directory. Image samples (TIFF and PNG) could not be processed in this environment because the Leptonica runtime was unavailable.
The following timings were captured while converting the PDF, TIFF and PNG samples from Docling's `tests/data` directory.
Copy link

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing Oxford comma before 'and PNG'. Should be 'PDF, TIFF, and PNG samples'.

Suggested change
The following timings were captured while converting the PDF, TIFF and PNG samples from Docling's `tests/data` directory.
The following timings were captured while converting the PDF, TIFF, and PNG samples from Docling's `tests/data` directory.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants