Skip to content

Commit 766c194

Browse files
authored
Model Inference Speeding up support by TensorRT-LLM (#5)
1 parent db16fc0 commit 766c194

File tree

15 files changed

+3876
-22
lines changed

15 files changed

+3876
-22
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ dist/
22
build/
33
**.egg-info/
44
**__pycache__/
5+
**.cache
56
ckpts/
67
**version.py
78

README.md

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,44 +12,55 @@
1212
Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
1313

1414

15-
## Abstract
15+
## Overview
1616
Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
1717

18-
## Release
19-
- [2024/7/30] 🔥 We have released the first version of StructEqTable. (Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.)
18+
## Changelog
19+
Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
20+
- [2024/8/08] 🔥 We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
21+
- [2024/7/30] We have released the first version of StructEqTable.
2022

2123
## TODO
2224

2325
- [x] Release inference code and checkpoints of StructEqTable.
2426
- [x] Support Chinese version of StructEqTable.
25-
- [ ] Accelerated version of StructEqTable using TensorRT-LLM.
27+
- [x] Accelerated version of StructEqTable using TensorRT-LLM.
2628
- [ ] Expand more domains of table image to improve the model's general capabilities.
2729
- [ ] Release our table pre-training and fine-tuning code
2830

31+
## Efficient Inference
32+
Our model now supports TensorRT-LLM deployment, achieving a 10x or more speedup in during inference.
33+
Please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md) to learn how to depoly.
2934

30-
### Installation
31-
35+
## Installation
3236
``` bash
33-
conda create -n structeqtable python=3.9
34-
37+
conda create -n structeqtable python>=3.10
3538
conda activate structeqtable
3639

40+
# Install from Source code (Suggested)
41+
git clone https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git
42+
cd StructEqTable-Deploy
43+
python setup develop
44+
45+
# or Install from Github repo
3746
pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
3847

48+
# or Install from PyPI
49+
pip install struct-eqtable==0.1.0
3950
```
4051

4152
## Quick Demo
4253
- Run the demo/demo.py
4354
```shell script
44-
cd demo
55+
cd tools/demo
4556

4657
python demo.py \
4758
--image_path ./demo.png \
4859
--ckpt_path ${CKPT_PATH} \
4960
--output_format latex
5061
```
5162

52-
- Obtain other format output
63+
- HTML or Markdown format output
5364

5465
Our model output Latex format code by default.
5566
If you want to get other format like HTML or Markdown,
@@ -59,7 +70,7 @@ python demo.py \
5970
sudo apt install pandoc
6071
pip install pypandoc
6172

62-
cd demo
73+
cd tools/demo
6374

6475
python demo.py \
6576
--image_path ./demo.png \
@@ -71,9 +82,9 @@ python demo.py \
7182
- Visualization Results
7283
- The input data are sampled from SciHub domain.
7384

74-
![](demo/demo_1.png)
85+
![](docs/demo_1.png)
7586

76-
![](demo/demo_2.png)
87+
![](docs/demo_2.png)
7788

7889

7990
## Acknowledgements
@@ -82,11 +93,12 @@ python demo.py \
8293
- [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
8394
- [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
8495
- [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
85-
- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
96+
- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
97+
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Model inference acceleration uses TensorRT-LLM.
8698

8799

88100
## License
89-
[Apache License 2.0](LICENSE)
101+
StructEqTable is released under the [Apache License 2.0](LICENSE)
90102

91103
## Citation
92104
If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

docs/GETTING_STARTED.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Getting Started
2+
[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) is used for model inference speeding up.
3+
4+
5+
### 1. Conda or Python Environment Preparation
6+
* Please follow the step 1, 2 from the [official tutorial](https://nvidia.github.io/TensorRT-LLM/installation/linux.html) of TensorRT-LLM to install the environment.
7+
``` bash
8+
# Installing on Linux
9+
10+
Step 1. Retrieve and launch the docker container (optional).
11+
12+
You can pre-install the environment using the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) to avoid manual environment configuration.
13+
14+
```bash
15+
# Obtain and start the basic docker image environment (optional).
16+
docker run --rm --ipc=host --runtime=nvidia --gpus all --entrypoint /bin/bash -it nvidia/cuda:12.4.1-devel-ubuntu22.04
17+
```
18+
Note: please make sure to set `--ipc=host` as a docker run argument to avoid `Bus error (core dumped)`.
19+
20+
Step 2. Install TensorRT-LLM.
21+
22+
```bash
23+
# Install dependencies, TensorRT-LLM requires Python 3.10
24+
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs
25+
26+
# Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
27+
# If you want to install the stable version (corresponding to the release branch), please
28+
# remove the `--pre` option.
29+
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
30+
31+
# Check installation
32+
python3 -c "import tensorrt_llm"
33+
```
34+
35+
Please note that TensorRT-LLM depends on TensorRT. In earlier versions that include TensorRT 8,
36+
overwriting an upgraded to a new version may require explicitly running `pip uninstall tensorrt`
37+
to uninstall the old version.
38+
```
39+
* Once you successfully execute `python3 -c "import tensorrt_llm"`, it means that you have completed Environment Preparation.
40+
41+
Tips: If you want to install the environment manually, please note that the version of Python require >= 3.10
42+
43+
44+
### 2. Model Compilation
45+
You can refer to the [official tutorial](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html) to complete the model compilation, or follow our instructions and use the provided scripts to implement it.
46+
47+
#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/main)
48+
```
49+
cd StructEqTable-Deploy
50+
51+
# using huggingface-cli download checkpoint
52+
huggingface-cli download --resume-download --local-dir-use-symlinks False U4R/StructTable-base --local-dir ckpts/StructTable-base
53+
54+
```
55+
After above steps, the files to directory of StructEqTable-Deploy as follows:
56+
```
57+
StructEqTable-Deploy
58+
├── ckpts
59+
│ ├── StructTable-base
60+
├── docs
61+
├── struct_eqtable
62+
├── tools
63+
```
64+
65+
#### 2.2 Convert Checkpoint and Build Engine
66+
We provide a script to help users quickly implement model compilation.
67+
68+
``` bash
69+
cd StructEqTable-Deploytools
70+
# execute the script to quickly compile the model.
71+
bash scripts/build_tensorrt.sh
72+
```
73+
After the script runs successfully, the built models can be found in `ckpts/StructTable-base-TensorRT`.
74+
The file structure in the path `ckpts/StructTable-base-TensorRT` should be as follows:
75+
```
76+
ckpts
77+
├── StructTable-base
78+
├── StructTable-base-TensorRT
79+
│ ├── trt_engines
80+
│ ├── trt_models
81+
│ ├── visual_engiens
82+
```
83+
84+
#### 2.3 Run Quickly Demo
85+
Run the demo/demo.py with TensorRT mode.
86+
87+
``` bash
88+
cd StructEqTable-Deploy/tools/demo
89+
90+
python demo.py \
91+
--image_path ./demo.png \
92+
--ckpt_path ../../ckpts/StructTable-base \
93+
--output_format latex
94+
--tensorrt ../../ckpts/StructTable-base-TensorRT
95+
```
96+
97+
You may get output as follows:
98+
```
99+
total cost time: 0.88s
100+
Table 0 LATEX format output:
101+
\begin{tabular}{l@{\hskip 0.4in}ccc@{\hskip 0.3in}ccc}\hline \multicolumn{1}{c}{\multirow{2}{*}{Model}} & \multicolumn{2}{c}{\bf MCQA} & \multicolumn{2}{c}{\bf NSP} & \multicolumn{2}{c}{\bf PI} \\\multicolumn{1}{c}{} & Accuracy & F1 & Accuracy & F1 & Accuracy & F1 \\ \hline FastText & 0.318 & 0.317 & 0.496 & 0.496 & 0.762 & 0.806 \\ELMo & 0.318 & 0.318 & 0.691 & 0.691 & 0.807 & 0.867 \\BERT & 0.346 & 0.346 & 0.514 & 0.514 & 0.801 & 0.857 \\ \hline \end{tabular}
102+
```
103+
File renamed without changes.
File renamed without changes.

struct_eqtable/__init__.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
from .model import StructTable
22

3-
43
def build_model(model_ckpt, **kwargs):
5-
6-
model = StructTable(model_ckpt, **kwargs)
4+
tensorrt_path = kwargs.get('tensorrt_path', None)
5+
if tensorrt_path is not None:
6+
from .model_trt import StructTableTensorRT
7+
model = StructTableTensorRT(model_ckpt, **kwargs)
8+
else:
9+
model = StructTable(model_ckpt, **kwargs)
10+
711
return model

struct_eqtable/model.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
import re
21
import torch
32

43
from torch import nn
54
from transformers import AutoModelForVision2Seq, AutoProcessor
65

76

87
class StructTable(nn.Module):
9-
def __init__(self, model_path='U4R/StructTable-base', max_new_tokens=4096, max_time=60):
8+
def __init__(self, model_path='U4R/StructTable-base', max_new_tokens=4096, max_time=60, **kwargs):
109
super().__init__()
1110
self.model_path = model_path
1211
self.max_new_tokens = max_new_tokens

0 commit comments

Comments
 (0)