Skip to content

Commit fd06078

Browse files
authored
upload latest model checkpoint link
upload latest model checkpoint link
2 parents e865c1f + 8504607 commit fd06078

File tree

5 files changed

+20
-7
lines changed

5 files changed

+20
-7
lines changed

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ Table is an effective way to represent structured data in scientific publication
1717

1818
## Changelog
1919
Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
20-
- [2024/8/08] 🔥 We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
20+
- **[2024/8/22] 🔥 We have released our [latest model](https://huggingface.co/U4R/StructTable-base/tree/v0.2), fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.**
21+
- [2024/8/08] We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
2122
- [2024/7/30] We have released the first version of StructEqTable.
2223

2324
## TODO
@@ -49,6 +50,14 @@ pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
4950
pip install struct-eqtable==0.1.0
5051
```
5152

53+
## Model Zoo
54+
55+
| Model | Image Token Num | Model Size | Training Data | Data Augmentation | TensorRT | HuggingFace |
56+
|---------------------|---------------------|------------|------------------|-------------------|----------|-------------------|
57+
| StructEqTable-base | 4096 | ~300M | DocGenome | | ☑️ | [v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
58+
| StructEqTable-base | 2048 | ~300M | DocGenome | ☑️ | ☑️ | [v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
59+
60+
5261
## Quick Demo
5362
- Run the demo/demo.py
5463
```shell script

docs/GETTING_STARTED.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Tips: If you want to install the environment manually, please note that the vers
5353
### 2. Model Compilation
5454
You can refer to the [official tutorial](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html) to complete the model compilation, or follow our instructions and use the provided scripts to implement it.
5555

56-
#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/main)
56+
#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/v0.2)
5757
```
5858
cd StructEqTable-Deploy
5959

struct_eqtable/model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ def forward(self, image):
4747
max_new_tokens=self.max_new_tokens,
4848
max_time=self.max_generate_time
4949
)
50+
5051
latex_codes = self.data_processor.batch_decode(model_output, skip_special_tokens=True)
5152
# postprocess
5253
for i, code in enumerate(latex_codes):

tools/demo/demo.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ def parse_config():
1010
parser = argparse.ArgumentParser(description='arg parser')
1111
parser.add_argument('--image_path', type=str, default='demo.png', help='data path for table image')
1212
parser.add_argument('--ckpt_path', type=str, default='U4R/StructTable-base', help='ckpt path for table model, which can be downloaded from huggingface')
13+
parser.add_argument('--max_new_tokens', type=int, default=2048, help='maximum output tokens of model inference')
1314
parser.add_argument('-t', '--max_waiting_time', type=int, default=60, help='maximum waiting time of model inference')
1415
parser.add_argument('--cpu', action='store_true', default=False, help='using cpu for inference')
1516
parser.add_argument('-f', '--output_format', type=str, nargs='+', default=['latex'],
@@ -26,7 +27,7 @@ def main():
2627
# build model
2728
model = build_model(
2829
args.ckpt_path,
29-
max_new_tokens=4096,
30+
max_new_tokens=args.max_new_tokens,
3031
max_time=args.max_waiting_time,
3132
tensorrt_path=args.tensorrt_path
3233
)

tools/scripts/build_tensorrt.sh

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@ set -x
22

33
HF_CKPT_PATH=${1:-"../ckpts/StructTable-base"}
44
MODEL_OUTPUT=${2:-"../ckpts/StructTable-base-TensorRT"}
5-
MODEL_TYPE=${3:-"StructEqTable"}
5+
MAX_IMAGE_TOKEN_NUM=${3:-2048}
6+
MAX_OUPTPUT_TOKEN_NUM=${4:-2048}
7+
MODEL_TYPE=${5:-"StructEqTable"}
68

79
if [ ! -d $MODEL_OUTPUT ]; then
810
mkdir -p $MODEL_OUTPUT
@@ -34,9 +36,9 @@ trtllm-build --checkpoint_dir $MODEL_OUTPUT/trt_models/float16/decoder \
3436
--remove_input_padding enable \
3537
--context_fmha disable \
3638
--max_beam_width 1 \
37-
--max_batch_size 8 \
38-
--max_seq_len 4096 \
39-
--max_encoder_input_len 4096 \
39+
--max_batch_size 1 \
40+
--max_seq_len $MAX_OUPTPUT_TOKEN_NUM \
41+
--max_encoder_input_len $MAX_IMAGE_TOKEN_NUM \
4042
--max_input_len 1
4143

4244
# Step3 build visual engine

0 commit comments

Comments
 (0)