Skip to content

Commit 13f29fd

Browse files
committed
Add new InternVL2-1B model with multi-format output (LaTex, HTML and Markdown) and Efficient inference by LMDeploy
1 parent fd06078 commit 13f29fd

File tree

15 files changed

+813
-56
lines changed

15 files changed

+813
-56
lines changed

README.md

Lines changed: 35 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,22 @@
44

55
[[ Related Paper ]](https://arxiv.org/abs/2406.11633) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/) [[ Dataset (Google Drive)]](https://drive.google.com/drive/folders/1OIhnuQdIjuSSDc_QL2nP4NwugVDgtItD) [[ Dataset (Hugging Face) ]](https://huggingface.co/datasets/U4R/DocGenome/tree/main)
66

7-
[[Models 🤗(Hugging Face)]](https://huggingface.co/U4R/StructTable-base/tree/main)
7+
[[Models 🤗(Hugging Face)]](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main)
88

99

1010
</div>
1111

12-
Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
12+
Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX/HTML/MarkDown, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
1313

1414

1515
## Overview
1616
Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
1717

1818
## Changelog
19-
Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
20-
- **[2024/8/22] 🔥 We have released our [latest model](https://huggingface.co/U4R/StructTable-base/tree/v0.2), fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.**
19+
- [2024/10/19] 🔥 We have released our **latest model [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main)**!
20+
21+
Thanks to IntenrVL2 powerful foundational capabilities, and through fine-tuning on the synthetic tabular data and DocGenome dataset, StructTable can convert table image into various common table formats including LaTeX, HTML, and Markdown. Moreover, inference speed has been significantly improved compared to the v0.2 version.
22+
- [2024/8/22] We have released our StructTable-base-v0.2, fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.
2123
- [2024/8/08] We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
2224
- [2024/7/30] We have released the first version of StructEqTable.
2325

@@ -26,12 +28,10 @@ Tips: Current version of StructEqTable is able to process table images from scie
2628
- [x] Release inference code and checkpoints of StructEqTable.
2729
- [x] Support Chinese version of StructEqTable.
2830
- [x] Accelerated version of StructEqTable using TensorRT-LLM.
29-
- [ ] Expand more domains of table image to improve the model's general capabilities.
31+
- [x] Expand more domains of table image to improve the model's general capabilities.
32+
- [x] Efficient inference of StructTable-InternVL2-1B by [LMDepoly](https://github.com/InternLM/lmdeploy) Tookit.
3033
- [ ] Release our table pre-training and fine-tuning code
3134

32-
## Efficient Inference
33-
Our model now supports TensorRT-LLM deployment, achieving a 10x or more speedup in during inference.
34-
Please refer to [GETTING_STARTED.md](docs/GETTING_STARTED.md) to learn how to depoly.
3535

3636
## Installation
3737
``` bash
@@ -47,15 +47,17 @@ python setup develop
4747
pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
4848

4949
# or Install from PyPI
50-
pip install struct-eqtable==0.1.0
50+
pip install struct-eqtable==0.3.0
5151
```
5252

5353
## Model Zoo
5454

55-
| Model | Image Token Num | Model Size | Training Data | Data Augmentation | TensorRT | HuggingFace |
56-
|---------------------|---------------------|------------|------------------|-------------------|----------|-------------------|
57-
| StructEqTable-base | 4096 | ~300M | DocGenome | | ☑️ | [v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
58-
| StructEqTable-base | 2048 | ~300M | DocGenome | ☑️ | ☑️ | [v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
55+
| Model | Model Size | Training Data | Data Augmentation | LMDepoly | TensorRT | HuggingFace |
56+
|---------------------|------------|------------------|-------------------|----------|----------|-------------------|
57+
| StructEqTable-InternVL | ~1B | DocGenome and Synthetic Data ||| | [v0.3](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main) |
58+
| StructEqTable-base | ~300M | DocGenome || || [v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
59+
| StructEqTable-base | ~300M | DocGenome | | || [v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
60+
5961

6062

6163
## Quick Demo
@@ -65,26 +67,34 @@ cd tools/demo
6567

6668
python demo.py \
6769
--image_path ./demo.png \
68-
--ckpt_path ${CKPT_PATH} \
70+
--ckpt_path U4R/StructTable-InternVL-1B \
6971
--output_format latex
7072
```
7173

72-
- HTML or Markdown format output
74+
- HTML or Markdown format output (Only Supported by StructTable-InternVL2-1B)
7375

74-
Our model output Latex format code by default.
75-
If you want to get other format like HTML or Markdown,
76-
`pypandoc` support convert latex format code into HTML and Markdown format for simple table (table has no merge cell ).
76+
```shell script
77+
python demo.py \
78+
--image_path ./demo.png \
79+
--ckpt_path U4R/StructTable-InternVL-1B \
80+
--output_format html markdown
81+
```
7782

83+
## Efficient Inference
84+
- Install LMDeploy Tookit
7885
```shell script
79-
sudo apt install pandoc
80-
pip install pypandoc
86+
pip install lmdeploy
87+
```
8188

89+
- Run the demo/demo.py
90+
```shell script
8291
cd tools/demo
8392

8493
python demo.py \
8594
--image_path ./demo.png \
86-
--ckpt_path ${CKPT_PATH} \
87-
--output_format html markdown
95+
--ckpt_path U4R/StructTable-InternVL-1B \
96+
--output_format latex \
97+
--lmdeploy
8898
```
8999

90100

@@ -100,9 +110,11 @@ python demo.py \
100110
- [DocGenome](https://github.com/UniModal4Reasoning/DocGenome). An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
101111
- [ChartVLM](https://github.com/UniModal4Reasoning/ChartVLM). A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
102112
- [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
113+
- [InternVL Family](https://github.com/OpenGVLab/InternVL). A Series of Powerful Foundational Vision-Language Models.
114+
- [LMDepoly](https://github.com/InternLM/lmdeploy). A toolkit for compressing, deploying, and serving LLM and MLLM.
103115
- [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
104116
- [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
105-
- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
117+
- [Nougat](https://github.com/facebookresearch/nougat). Data Augmentation follows Nougat.
106118
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). Model inference acceleration uses TensorRT-LLM.
107119

108120

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,33 @@ You may get output as follows:
107107
```
108108
total cost time: 0.88s
109109
Table 0 LATEX format output:
110-
\begin{tabular}{l@{\hskip 0.4in}ccc@{\hskip 0.3in}ccc}\hline \multicolumn{1}{c}{\multirow{2}{*}{Model}} & \multicolumn{2}{c}{\bf MCQA} & \multicolumn{2}{c}{\bf NSP} & \multicolumn{2}{c}{\bf PI} \\\multicolumn{1}{c}{} & Accuracy & F1 & Accuracy & F1 & Accuracy & F1 \\ \hline FastText & 0.318 & 0.317 & 0.496 & 0.496 & 0.762 & 0.806 \\ELMo & 0.318 & 0.318 & 0.691 & 0.691 & 0.807 & 0.867 \\BERT & 0.346 & 0.346 & 0.514 & 0.514 & 0.801 & 0.857 \\ \hline \end{tabular}
110+
\begin{tabular}{|c|c|c|c|}
111+
\hline
112+
Quantity $\backslash$ Unit System & International System SI (kg-m-s) & Traditional aeronautical (lb-ft-s) & Traditional structural (lb-inch-s) \\
113+
\hline
114+
Mass (translational inertia), $m$ & kilogram mass (kg) & slug = lb-s$^2$/f & lb-s$^2$/inch \\
115+
\hline
116+
Length, translational motion & meter (m) & foot (ft) & inch (in.) \\
117+
\hline
118+
Time, $t$ & second (s) & second (s) & second (s) \\
119+
\hline
120+
Force, translational action & newton (N) = kg-m/s$^2$ & pound force (lb) & pound force (lb) \\
121+
\hline
122+
Translational stiffness constant, $k$ & N/m & lb/ft & lb/inch \\
123+
\hline
124+
Translational damping constant, $c$ & N/(m/s) = N-s/m & lb/(ft/s) = lb-s/ft & lb/(inch/s) = lb-s/inch \\
125+
\hline
126+
Angle, rotational motion & radial (rad), which is dimensionless & radial (rad), which is dimensionless & radial (rad), which is dimensionless \\
127+
\hline
128+
Rotational inertia, $J$ & kg-m$^2$ & slug-ft$^2$ = lb-s$^2$ - ft & lb-s$^2$ - inch \\
129+
\hline
130+
Moment or torque, rotational action & N-m & lb-ft & lb-inch \\
131+
\hline
132+
Rotational stiffness constant, $k_\theta$ & (N-m)/rad = N-m & (lb-ft)/rad = lb-ft & (lb-inch)/rad = lb-inch \\
133+
\hline
134+
Rotational damping constant, $c_\theta$ & (N-m)/(rad/s) = N-m-s & (lb-ft)/(rad/s) = lb-ft-s & (lb-inch)/(rad/s) = lb-inch-s \\
135+
\hline
136+
\end{tabular}
111137
```
112138

113139

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ def write_version_to_file(version, target_file):
77
print('__version__ = "%s"' % version, file=f)
88

99
if __name__ == '__main__':
10-
version = '0.1.0'
10+
version = '0.3.0'
1111
write_version_to_file(version, 'struct_eqtable/version.py')
1212
with Path(Path(__file__).parent,
1313
'README.md').open(encoding='utf-8') as file:

struct_eqtable/__init__.py

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,43 @@
1-
from .model import StructTable
1+
from .pix2s import Pix2Struct, Pix2StructTensorRT
2+
from .internvl import InternVL, InternVL_LMDeploy
23

3-
def build_model(model_ckpt, **kwargs):
4-
tensorrt_path = kwargs.get('tensorrt_path', None)
5-
if tensorrt_path is not None:
6-
from .model_trt import StructTableTensorRT
7-
model = StructTableTensorRT(model_ckpt, **kwargs)
4+
from transformers import AutoConfig
5+
6+
7+
__ALL_MODELS__ = {
8+
'Pix2Struct': Pix2Struct,
9+
'Pix2StructTensorRT': Pix2StructTensorRT,
10+
'InternVL': InternVL,
11+
'InternVL_LMDeploy': InternVL_LMDeploy,
12+
}
13+
14+
15+
def get_model_name(model_path):
16+
model_config = AutoConfig.from_pretrained(
17+
model_path,
18+
trust_remote_code=True,
19+
)
20+
21+
if 'Pix2Struct' in model_config.architectures[0]:
22+
model_name = 'Pix2Struct'
23+
elif 'InternVL' in model_config.architectures[0]:
24+
model_name = 'InternVL'
825
else:
9-
model = StructTable(model_ckpt, **kwargs)
26+
raise ValueError(f"Unsupported model type: {model_config.architectures[0]}")
27+
28+
return model_name
29+
30+
31+
def build_model(model_ckpt, **kwargs):
32+
model_name = get_model_name(model_ckpt)
33+
if model_name == 'InternVL' and kwargs.get('lmdeploy', False):
34+
model_name = 'InternVL_LMDeploy'
35+
elif model_name == 'Pix2Struct' and kwargs.get('tensorrt_path', None):
36+
model_name = 'Pix2StructTensorRT'
37+
38+
model = __ALL_MODELS__[model_name](
39+
model_ckpt,
40+
**kwargs
41+
)
1042

1143
return model
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from .internvl import InternVL
2+
from .internvl_lmdeploy import InternVL_LMDeploy

0 commit comments

Comments
 (0)