Skip to content

Commit 9a8bb53

Browse files
authored
Merge pull request #2 from UniModal4Reasoning/dev
update README.md and demo.py
2 parents 1b2f817 + b7d5d1d commit 9a8bb53

File tree

2 files changed

+43
-12
lines changed

2 files changed

+43
-12
lines changed

README.md

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,15 @@ Welcome to the official repository of StructEqTable-Deploy, a solution that conv
1515
## Abstract
1616
Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
1717

18+
## Release
19+
- [2024/7/30] 🔥 We have released the first version of StructEqTable. (Current version of StructEqTable is able to process table images from scientific documents such as arXiv, Scihub papers. Times New Roman And Songti(宋体) are main fonts used in table image, other fonts may decrease the accuracy of the model's output.)
1820

1921
## TODO
2022

2123
- [x] Release inference code and checkpoints of StructEqTable.
2224
- [x] Support Chinese version of StructEqTable.
23-
- [ ] Improve the inference speed of StructEqTable.
25+
- [ ] Expand more domains of table image to improve the model's general capabilities.
26+
- [ ] Accelerated version of StructEqTable using TensorRT-LLM.
2427

2528

2629
### Installation
@@ -35,14 +38,35 @@ pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
3538
```
3639

3740
## Quick Demo
38-
- run the demo/demo.py
41+
- Run the demo/demo.py
3942
```shell script
4043
cd demo
4144

42-
python demo.py \ --image_path ./demo.png \
43-
--ckpt_path ${CKPT_PATH}
45+
python demo.py \
46+
--image_path ./demo.png \
47+
--ckpt_path ${CKPT_PATH} \
48+
--output_format latex
4449
```
4550

51+
- Obtain other format output
52+
53+
Our model output Latex format code by default.
54+
If you want to get other format like HTML or Markdown,
55+
`pypandoc` support convert latex format code into HTML and Markdown format for simple table (table has no merge cell ).
56+
57+
```shell script
58+
sudo apt install pandoc
59+
pip install pypandoc
60+
61+
cd demo
62+
63+
python demo.py \
64+
--image_path ./demo.png \
65+
--ckpt_path ${CKPT_PATH} \
66+
--output_format html markdown
67+
```
68+
69+
4670
- Visualization Results
4771
- The input data are sampled from SciHub domain.
4872

demo/demo.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,20 @@ def parse_config():
1010
parser = argparse.ArgumentParser(description='arg parser')
1111
parser.add_argument('--image_path', type=str, default='demo.png', help='data path for table image')
1212
parser.add_argument('--ckpt_path', type=str, default='U4R/StructTable-base', help='ckpt path for table model, which can be downloaded from huggingface')
13+
parser.add_argument('-t', '--max_waiting_time', type=int, default=60, help='maximum waiting time of model inference')
1314
parser.add_argument('--cpu', action='store_true', default=False, help='using cpu for inference')
14-
parser.add_argument('--html', action='store_true', default=False, help='output html format table code')
15+
parser.add_argument('-f', '--output_format', type=str, nargs='+', default=['latex'],
16+
help='The model outputs LaTeX format code by default. Simple structured table LaTeX code can be converted to HTML or Markdown format using pypandoc.')
1517
args = parser.parse_args()
1618
return args
1719

1820
def main():
1921
args = parse_config()
20-
if args.html:
22+
if 'html' in args.output_format or 'markdown' in args.output_format:
2123
from pypandoc import convert_text
2224

2325
# build model
24-
model = build_model(args.ckpt_path, max_new_tokens=4096, max_time=60)
26+
model = build_model(args.ckpt_path, max_new_tokens=4096, max_time=args.max_waiting_time)
2527
if not args.cpu:
2628
model = model.cuda()
2729

@@ -35,12 +37,17 @@ def main():
3537
# show output latex code of table
3638
cost_time = time.time() - start_time
3739
print(f"total cost time: {cost_time:.2f}s")
40+
41+
if cost_time >= args.max_waiting_time:
42+
warn_log = f"\033[93mThe model inference time exceeds the maximum waiting time {args.max_waiting_time} seconds, the result may be incomplete.\n" \
43+
"Please increase the maximum waiting time with argument --max_waiting_time or Model may not support the type of input table image \033[0m"
44+
print(warn_log)
45+
46+
3847
for i, latex_code in enumerate(output):
39-
if args.html:
40-
html_code = convert_text(latex_code, 'html', format='latex')
41-
print(f"Table {i} HTML code:\n{html_code}")
42-
else:
43-
print(f"Table {i} LaTex code:\n{latex_code}")
48+
for tgt_fmt in args.output_format:
49+
tgt_code = convert_text(latex_code, tgt_fmt, format='latex') if tgt_fmt != 'latex' else latex_code
50+
print(f"Table {i} {tgt_fmt.upper()} format output:\n{tgt_code}")
4451

4552

4653
if __name__ == '__main__':

0 commit comments

Comments
 (0)