InternScience
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 27 additions & 15 deletions b/‎README.md‎
Lines changed: 27 additions & 15 deletions
diff --git a/‎docs/GETTING_STARTED.md‎
Lines changed: 103 additions & 0 deletions b/‎docs/GETTING_STARTED.md‎
Lines changed: 103 additions & 0 deletions
diff --git a/‎demo/demo_1.png‎ ‎docs/demo_1.png‎demo/demo_1.png renamed to docs/demo_1.png b/‎demo/demo_1.png‎ ‎docs/demo_1.png‎demo/demo_1.png renamed to docs/demo_1.png
diff --git a/‎demo/demo_2.png‎ ‎docs/demo_2.png‎demo/demo_2.png renamed to docs/demo_2.png b/‎demo/demo_2.png‎ ‎docs/demo_2.png‎demo/demo_2.png renamed to docs/demo_2.png
diff --git a/‎struct_eqtable/__init__.py‎
Lines changed: 7 additions & 3 deletions b/‎struct_eqtable/__init__.py‎
Lines changed: 7 additions & 3 deletions
diff --git a/‎struct_eqtable/model.py‎
Lines changed: 1 addition & 2 deletions b/‎struct_eqtable/model.py‎
Lines changed: 1 addition & 2 deletions
@@ -2,6 +2,7 @@ dist/
 build/
 **.egg-info/
 **__pycache__/
+**.cache
 ckpts/
 **version.py
 
@@ -12,44 +12,55 @@
 Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
 
 
-## Abstract
+## Overview
 Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
 
-## Release
-- [2024/7/30] 🔥 We have released the first version of StructEqTable. (Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.)
+## Changelog
+Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
+- [2024/8/08] 🔥 We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
+- [2024/7/30] We have released the first version of StructEqTable. 
 
 ## TODO
 
 - [x] Release inference code and checkpoints of StructEqTable.
 - [x] Support Chinese version of StructEqTable.
-- [ ] Accelerated version of StructEqTable using TensorRT-LLM.
+- [x] Accelerated version of StructEqTable using TensorRT-LLM.
 - [ ] Expand more domains of table image to improve the model's general capabilities.
 - [ ] Release our table pre-training and fine-tuning code
 
+## Efficient Inference
+Our model now supports TensorRT-LLM deployment, achieving a 10x or more speedup in during inference.  
+Please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md) to learn how to depoly.
 
-### Installation
-
+## Installation
 ``` bash 
-conda create -n structeqtable python=3.9
-
+conda create -n structeqtable python>=3.10
 conda activate structeqtable
 
+# Install from Source code  (Suggested)
+git clone https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git
+cd StructEqTable-Deploy
+python setup develop
+
+# or Install from Github repo
 pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
 
+# or Install from PyPI
+pip install struct-eqtable==0.1.0
 ```
 
 ## Quick Demo
 - Run the demo/demo.py
 ```shell script
-cd demo
+cd tools/demo
 
 python demo.py \
   --image_path ./demo.png \
   --ckpt_path ${CKPT_PATH} \
   --output_format latex
 ```
 
-- Obtain other format output
+- HTML or Markdown format output
 
   Our model output Latex format code by default. 
   If you want to get other format like HTML or Markdown, 
@@ -59,7 +70,7 @@ python demo.py \
 sudo apt install pandoc
 pip install pypandoc
 
-cd demo
+cd tools/demo
 
 python demo.py \
   --image_path ./demo.png \
@@ -71,9 +82,9 @@ python demo.py \
 - Visualization Results
     - The input data are sampled from SciHub domain.
 
-![](demo/demo_1.png)
+![](docs/demo_1.png)
 
-![](demo/demo_2.png)
+![](docs/demo_2.png)
 
 
 ## Acknowledgements
@@ -82,11 +93,12 @@ python demo.py \
 - [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
 - [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
 - [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
-- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
+- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.  
+- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Model inference acceleration uses TensorRT-LLM.
 
 
 ## License
-[Apache License 2.0](LICENSE)
+StructEqTable is released under the [Apache License 2.0](LICENSE)
 
 ## Citation
 If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)  
 
@@ -0,0 +1,103 @@
+# Getting Started
+[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) is used for model inference speeding up.  
+
+
+### 1. Conda or Python Environment Preparation
+* Please follow the step 1, 2 from the [official tutorial](https://nvidia.github.io/TensorRT-LLM/installation/linux.html) of TensorRT-LLM to install the environment. 
+``` bash
+# Installing on Linux
+
+Step 1. Retrieve and launch the docker container (optional).
+
+    You can pre-install the environment using the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) to avoid manual environment configuration.
+
+    ```bash
+    # Obtain and start the basic docker image environment (optional).
+    docker run --rm --ipc=host --runtime=nvidia --gpus all --entrypoint /bin/bash -it nvidia/cuda:12.4.1-devel-ubuntu22.04
+    ```
+    Note: please make sure to set `--ipc=host` as a docker run argument to avoid `Bus error (core dumped)`.
+
+Step 2. Install TensorRT-LLM.
+
+    ```bash
+    # Install dependencies, TensorRT-LLM requires Python 3.10
+    apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs
+
+    # Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
+    # If you want to install the stable version (corresponding to the release branch), please
+    # remove the `--pre` option.
+    pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
+
+    # Check installation
+    python3 -c "import tensorrt_llm"
+    ```
+
+    Please note that TensorRT-LLM depends on TensorRT. In earlier versions that include TensorRT 8,
+    overwriting an upgraded to a new version may require explicitly running `pip uninstall tensorrt`
+    to uninstall the old version.
+```
+* Once you successfully execute `python3 -c "import tensorrt_llm"`, it means that you have completed Environment Preparation.  
+
+Tips: If you want to install the environment manually, please note that the version of Python require >= 3.10
+
+
+### 2. Model Compilation
+You can refer to the [official tutorial](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html) to complete the model compilation, or follow our instructions and use the provided scripts to implement it.
+
+#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/main)
+```
+cd StructEqTable-Deploy
+
+# using huggingface-cli download checkpoint
+huggingface-cli download --resume-download --local-dir-use-symlinks False U4R/StructTable-base --local-dir ckpts/StructTable-base
+
+```
+After above steps, the files to directory of StructEqTable-Deploy as follows:  
+```
+StructEqTable-Deploy
+├── ckpts
+│   ├── StructTable-base 
+├── docs
+├── struct_eqtable
+├── tools
+```
+
+#### 2.2 Convert Checkpoint and Build Engine
+We provide a script to help users quickly implement model compilation.
+
+``` bash
+cd StructEqTable-Deploytools
+# execute the script to quickly compile the model.
+bash scripts/build_tensorrt.sh 
+```
+After the script runs successfully, the built models can be found in `ckpts/StructTable-base-TensorRT`.  
+The file structure in the path `ckpts/StructTable-base-TensorRT` should be as follows:  
+```
+ckpts
+├── StructTable-base 
+├── StructTable-base-TensorRT 
+│   ├── trt_engines 
+│   ├── trt_models
+│   ├── visual_engiens
+```
+
+#### 2.3 Run Quickly Demo
+Run the demo/demo.py with TensorRT mode.
+
+``` bash
+cd StructEqTable-Deploy/tools/demo
+
+python demo.py \
+  --image_path ./demo.png \
+  --ckpt_path ../../ckpts/StructTable-base \
+  --output_format latex
+  --tensorrt ../../ckpts/StructTable-base-TensorRT
+```
+
+You may get output as follows:
+```
+total cost time: 0.88s
+Table 0 LATEX format output:
+\begin{tabular}{l@{\hskip 0.4in}ccc@{\hskip 0.3in}ccc}\hline \multicolumn{1}{c}{\multirow{2}{*}{Model}} & \multicolumn{2}{c}{\bf MCQA} & \multicolumn{2}{c}{\bf NSP} & \multicolumn{2}{c}{\bf PI} \\\multicolumn{1}{c}{} & Accuracy & F1 & Accuracy & F1 & Accuracy & F1 \\ \hline FastText & 0.318 & 0.317 & 0.496 & 0.496 & 0.762 & 0.806 \\ELMo & 0.318 & 0.318 & 0.691 & 0.691 & 0.807 & 0.867 \\BERT & 0.346 & 0.346 & 0.514 & 0.514 & 0.801 & 0.857 \\ \hline \end{tabular}
+```
+
@@ -1,7 +1,11 @@
 from .model import StructTable
 
-
 def build_model(model_ckpt, **kwargs):
-    
-    model = StructTable(model_ckpt, **kwargs)
+    tensorrt_path = kwargs.get('tensorrt_path', None)
+    if tensorrt_path is not None:
+        from .model_trt import StructTableTensorRT
+        model = StructTableTensorRT(model_ckpt, **kwargs)
+    else:
+        model = StructTable(model_ckpt, **kwargs)
+
     return model
@@ -1,12 +1,11 @@
-import re
 import torch
 
 from torch import nn
 from transformers import AutoModelForVision2Seq, AutoProcessor
 
 
 class StructTable(nn.Module):
-    def __init__(self, model_path='U4R/StructTable-base', max_new_tokens=4096, max_time=60):
+    def __init__(self, model_path='U4R/StructTable-base', max_new_tokens=4096, max_time=60, **kwargs):
         super().__init__()
         self.model_path = model_path
         self.max_new_tokens = max_new_tokens