diff --git a/README.md b/README.md index 4cdf0fe..aaa72fd 100644 --- a/README.md +++ b/README.md @@ -125,27 +125,31 @@ Bounding boxes use normalised `[x,y,w,h]` coordinates. The test asserts equality ## Docling data conversion timings -The following timings were captured while converting the PDF samples from Docling's `tests/data` directory. Image samples (TIFF and PNG) could not be processed in this environment because the Leptonica runtime was unavailable. +The following timings were captured while converting the PDF, TIFF and PNG samples from Docling's `tests/data` directory. | File | Type | Markdown ms | BBox ms | | --- | --- | --- | --- | -| 2203.01017v2.pdf | pdf | 1756.00 | 223.11 | -| 2206.01062.pdf | pdf | 927.07 | 52.02 | -| 2305.03393v1-pg9.pdf | pdf | 62.26 | 3.74 | -| 2305.03393v1.pdf | pdf | 333.04 | 28.77 | -| amt_handbook_sample.pdf | pdf | 167.56 | 4.37 | -| code_and_formula.pdf | pdf | 55.00 | 7.73 | -| multi_page.pdf | pdf | 95.89 | 10.01 | -| picture_classification.pdf | pdf | 30.85 | 4.90 | -| redp5110_sampled.pdf | pdf | 373.09 | 27.89 | -| right_to_left_01.pdf | pdf | 34.93 | 1.57 | -| right_to_left_02.pdf | pdf | 24.03 | 1.48 | -| right_to_left_03.pdf | pdf | 45.60 | 1.25 | +| 2305.03393v1-pg9-img.png | png | 2307.59 | 67.51 | +| 2203.01017v2.pdf | pdf | 1542.61 | 59.87 | +| 2206.01062.pdf | pdf | 890.08 | 26.35 | +| 2305.03393v1-pg9.pdf | pdf | 105.83 | 0.92 | +| 2305.03393v1.pdf | pdf | 384.44 | 14.13 | +| amt_handbook_sample.pdf | pdf | 189.17 | 1.77 | +| code_and_formula.pdf | pdf | 72.72 | 2.68 | +| multi_page.pdf | pdf | 91.91 | 3.52 | +| picture_classification.pdf | pdf | 42.41 | 2.33 | +| redp5110_sampled.pdf | pdf | 417.99 | 15.57 | +| right_to_left_01.pdf | pdf | 31.26 | 0.70 | +| right_to_left_02.pdf | pdf | 32.62 | 0.59 | +| right_to_left_03.pdf | pdf | 48.57 | 0.39 | +| 2206.01062.tif | tiff | 1040.21 | 1.90 | | Type | Avg Markdown ms | Avg BBox ms | | --- | --- | --- | -| pdf | 325.44 | 30.57 | -| **Overall** | 325.44 | 30.57 | +| pdf | 320.80 | 10.74 | +| png | 2307.59 | 67.51 | +| tiff | 1040.21 | 1.90 | +| **Overall** | 514.10 | 14.16 | ### Comparison with markitdown timings @@ -153,12 +157,12 @@ The [markitdown](https://github.com/mapo80/markitdown) project reports Docling d | Type | markitdown MD s | markitdown BBox s | MarkItDownNet MD s | MarkItDownNet BBox s | | --- | --- | --- | --- | --- | -| pdf | 3.29 | 5.14 | 0.33 | 0.03 | -| png | 2.51 | 5.56 | – | – | -| tiff | 2.57 | 4.19 | – | – | -| **Overall** | 3.18 | 5.10 | 0.33 | 0.03 | +| pdf | 3.29 | 5.14 | 0.32 | 0.01 | +| png | 2.51 | 5.56 | 2.31 | 0.07 | +| tiff | 2.57 | 4.19 | 1.04 | 0.00 | +| **Overall** | 3.18 | 5.10 | 0.51 | 0.01 | -On the PDF samples, MarkItDownNet completed Markdown conversion about **10×** faster and bounding box generation roughly **170×** faster than markitdown. Image timings are unavailable here because the Leptonica runtime was missing. +On the Docling samples, MarkItDownNet completed Markdown conversion several times faster and produced bounding boxes orders of magnitude quicker than markitdown. ## License