upload latest model checkpoint link

PrinceVictor · web-flow · commit fd06078bfa93 · 2024-08-22T15:36:40.000+08:00
diff --git a/README.md b/README.md
@@ -17,7 +17,8 @@ Table is an effective way to represent structured data in scientific publication
 
 ## Changelog
 Tips: Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.
-- [2024/8/08] 🔥 We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
+- **[2024/8/22] 🔥 We have released our [latest model](https://huggingface.co/U4R/StructTable-base/tree/v0.2), fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.** 
+- [2024/8/08] We have released the TensorRT accelerated version, which only takes about 1 second for most images on GPU A100. Please follow the tutorial to install the environment and compile the model weights.
 - [2024/7/30] We have released the first version of StructEqTable. 
 
 ## TODO
@@ -49,6 +50,14 @@ pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
 pip install struct-eqtable==0.1.0
 ```
 
+## Model Zoo
+
+| Model | Image Token Num | Model Size | Training Data | Data Augmentation | TensorRT | HuggingFace |
+|---------------------|---------------------|------------|------------------|-------------------|----------|-------------------|
+| StructEqTable-base | 4096 | ~300M | DocGenome |  | ☑️ | [v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
+| StructEqTable-base | 2048 | ~300M | DocGenome | ☑️ | ☑️ | [v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
+
+
 ## Quick Demo
 - Run the demo/demo.py
 ```shell script
diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
@@ -53,7 +53,7 @@ Tips: If you want to install the environment manually, please note that the vers
 ### 2. Model Compilation
 You can refer to the [official tutorial](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html) to complete the model compilation, or follow our instructions and use the provided scripts to implement it.
 
-#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/main)
+#### 2.1 Download [StructEqTable checkpoints](https://huggingface.co/U4R/StructTable-base/tree/v0.2)
 ```
 cd StructEqTable-Deploy
 
diff --git a/struct_eqtable/model.py b/struct_eqtable/model.py
@@ -47,6 +47,7 @@ def forward(self, image):
             max_new_tokens=self.max_new_tokens,
             max_time=self.max_generate_time
         )
+
         latex_codes = self.data_processor.batch_decode(model_output, skip_special_tokens=True)
         # postprocess
         for i, code in enumerate(latex_codes):
diff --git a/tools/demo/demo.py b/tools/demo/demo.py
@@ -10,6 +10,7 @@ def parse_config():
     parser = argparse.ArgumentParser(description='arg parser')
     parser.add_argument('--image_path', type=str, default='demo.png', help='data path for table image')
     parser.add_argument('--ckpt_path', type=str, default='U4R/StructTable-base', help='ckpt path for table model, which can be downloaded from huggingface')
+    parser.add_argument('--max_new_tokens', type=int, default=2048, help='maximum output tokens of model inference')
     parser.add_argument('-t', '--max_waiting_time', type=int, default=60, help='maximum waiting time of model inference')
     parser.add_argument('--cpu', action='store_true', default=False, help='using cpu for inference')
     parser.add_argument('-f', '--output_format', type=str, nargs='+', default=['latex'], 
@@ -26,7 +27,7 @@ def main():
     # build model
     model = build_model(
         args.ckpt_path, 
-        max_new_tokens=4096, 
+        max_new_tokens=args.max_new_tokens, 
         max_time=args.max_waiting_time,
         tensorrt_path=args.tensorrt_path
     )
diff --git a/tools/scripts/build_tensorrt.sh b/tools/scripts/build_tensorrt.sh
@@ -2,7 +2,9 @@ set -x
 
 HF_CKPT_PATH=${1:-"../ckpts/StructTable-base"}
 MODEL_OUTPUT=${2:-"../ckpts/StructTable-base-TensorRT"}
-MODEL_TYPE=${3:-"StructEqTable"}
+MAX_IMAGE_TOKEN_NUM=${3:-2048}
+MAX_OUPTPUT_TOKEN_NUM=${4:-2048}
+MODEL_TYPE=${5:-"StructEqTable"}
 
 if [ ! -d $MODEL_OUTPUT ]; then
     mkdir -p $MODEL_OUTPUT
@@ -34,9 +36,9 @@ trtllm-build --checkpoint_dir $MODEL_OUTPUT/trt_models/float16/decoder \
     --remove_input_padding enable \
     --context_fmha disable \
     --max_beam_width 1 \
-    --max_batch_size 8 \
-    --max_seq_len 4096 \
-    --max_encoder_input_len 4096 \
+    --max_batch_size 1 \
+    --max_seq_len $MAX_OUPTPUT_TOKEN_NUM \
+    --max_encoder_input_len $MAX_IMAGE_TOKEN_NUM \
     --max_input_len 1
 
 # Step3 build visual engine

Original file line number	Diff line number	Diff line change
`@@ -47,6 +47,7 @@ def forward(self, image):`
`47`	`47`	`max_new_tokens=self.max_new_tokens,`
`48`	`48`	`max_time=self.max_generate_time`
`49`	`49`	`)`
	`50`	`+`
`50`	`51`	`latex_codes = self.data_processor.batch_decode(model_output, skip_special_tokens=True)`
`51`	`52`	`# postprocess`
`52`	`53`	`for i, code in enumerate(latex_codes):`