Skip to content

Commit ba54691

Browse files
committed
update readme
1 parent 12363fa commit ba54691

File tree

2 files changed

+67
-54
lines changed

2 files changed

+67
-54
lines changed

README.md

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,22 @@ English | [中文](README_CN.md)
1515
[Introduction](#introduction) |
1616
[Installation](#installation) |
1717
[Quick Start](#quick-start) |
18-
[Model List](#supported-models-and-performance) |
18+
[Model List](#model-list) |
1919
[Notes](#notes)
2020

2121
</div>
2222

2323

2424
## Introduction
25-
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.
25+
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.
2626

2727

2828
<details open>
2929
<summary> Major Features </summary>
3030

31-
- **Modulation design**: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
31+
- **Modulation design**: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
3232
- **High-performance**: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
33-
- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recogintion tasks.
33+
- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recognition tasks.
3434
</details>
3535

3636

@@ -43,7 +43,7 @@ To install the dependency, please run
4343
pip install -r requirements.txt
4444
```
4545

46-
Additionally, please install MindSpore(>=1.9) following the official [instructions](https://www.mindspore.cn/install) for the best fit of your machine.
46+
Additionally, please install MindSpore(>=1.9) following the official [installation instructions](https://www.mindspore.cn/install) for the best fit of your machine.
4747

4848
For distributed training, please install [openmpi 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/).
4949

@@ -63,59 +63,68 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
6363
6464
## Quick Start
6565

66-
### Text Detection Model Training
66+
### 1. Model Training and Evaluation
6767

68-
We will use **DBNet** model and **ICDAR2015** dataset for demonstration, although other models and datasets are also supported. Please refer to [DBNet model README](configs/det/dbnet/README.md).
68+
#### 1.1 Text Detection
6969

70+
We will take **DBNet** model and **ICDAR2015** dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.
7071

71-
### Text Recognition Model Training
72+
Please refer to [DBNet readme](configs/det/dbnet/README.md#3-quick-start) for detailed instructions.
7273

73-
We will use **CRNN** model and **LMDB** dataset for demonstration, although other models and datasets are also supported. Please refer to [CRNN model README](configs/rec/crnn/README.md).
7474

75+
#### 1.2 Text Recognition
7576

76-
### Inference and Deployment
77+
We will take **CRNN** model and **LMDB** dataset as an illustration on how to configure and launch the training process easily.
7778

78-
#### Inference with MX Engine
79+
Detailed instructions can be viewed in [CRNN readme](configs/rec/crnn/README.md#3-quick-start).
7980

80-
Please refer to [mx_infer](docs/cn/inference_cn.md).
81+
**Note:**
82+
The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with `python tools/train.py -c /path/to/yaml_config`.
8183

82-
#### Inference with Lite
84+
### 2. Inference and Deployment
8385

84-
Coming soon
86+
#### 2.1 Inference with MX Engine
87+
88+
MX, which is short for [MindX](https://www.hiascend.com/zh/software/mindx-sdk), allows efficient model inference and deployment on Ascend devices.
8589

86-
#### Inference with native MindSpore
90+
MindOCR supports OCR model inference with MX Engine. Please refer to [mx_infer](docs/cn/inference_cn.md) for detailed illustrations.
91+
92+
#### 2.2 Inference with MS Lite
8793

8894
Coming soon
8995

90-
## Supported Models and Performance
96+
#### 2.3 Inference with native MindSpore
97+
98+
Coming soon
9199

92-
### Text Detection
100+
## Model List
93101

94-
The supported detection models and their performance on the test set of ICDAR2015 are as follow.
102+
<details open>
103+
<summary>Text Detection</summary>
95104

96-
| **Model** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Config** |
97-
|-----------|--------------|----------------|------------|---------------|-------------|---------------------------------------------------|
98-
| DBNet | ResNet-50 | ImageNet | 81.97% | 86.05% | 83.96% | [YAML](configs/det/dbnet/db_r50_icdar15.yaml) |
99-
| DBNet++ | ResNet-50 | ImageNet | 82.02% | 87.38% | 84.62% | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml) |
105+
- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020)
106+
- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
107+
- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [dev]
100108

101-
### Text Recognition
109+
</details>
102110

103-
The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow
111+
<details open>
112+
<summary>Text Recognition</summary>
104113

114+
- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
115+
- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [dev]
116+
- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [infer only]
105117

106-
| **Model** | **Backbone** | **Avg Acc**| **Config** |
107-
|-----------|--------------|----------------|------------|
108-
| CRNN | VGG7 | 82.03% | [YAML](configs/rec/crnn/crnn_vgg7.yaml) |
109-
| CRNN | Resnet34_vd | 84.45% | [YAML](configs/rec/crnn/crnn_resnet34.yaml) |
110118

119+
For the detailed performance of the trained models, please refer to [configs](./configs).
111120

112-
For more details, please refer to [configs](./configs).
121+
For detailed inference performance using MX engine, please refer to [mx inference performance](docs/cn/inference_models_cn.md)
113122

114123
## Notes
115124

116125
### Change Log
117126
- 2023/03/23
118-
1. Add dynamic loss scaler support, compatiable with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A yaml example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
127+
1. Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A YAML example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
119128

120129
- 2023/03/20
121130
1. Arg names changed: `output_keys` -> `output_columns`, `num_keys_to_net` -> `num_columns_to_net`
@@ -139,7 +148,7 @@ For more details, please refer to [configs](./configs).
139148

140149
### How to Contribute
141150

142-
We appreciate all kind of contributions including issues and PRs to make MindOCR better.
151+
We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
143152

144153
Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for the contributing guideline. Please follow the [Model Template and Guideline](mindocr/models/README.md) for contributing a model that fits the overall interface :)
145154

README_CN.md

Lines changed: 27 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@
1111

1212
[English](README.md) | 中文
1313

14-
[概述](#introduction) |
15-
[安装](#installation) |
16-
[快速上手](#quick-start) |
17-
[模型列表](#supported-models-and-performance) |
18-
[注释](#notes)
14+
[概述](#概述) |
15+
[安装](#安装) |
16+
[快速上手](#快速上手) |
17+
[模型列表](#模型列表) |
18+
[重要信息](#重要信息)
1919

2020
</div>
2121

@@ -59,12 +59,14 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
5959
6060
## 快速上手
6161

62-
### 训练文本检测模型
62+
### 模型训练评估
63+
64+
#### 文本检测
6365

6466
MindOCR支持多种文本检测模型及数据集,在此我们使用**DBNet**模型和**ICDAR2015**数据集进行演示。请参考[DBNet模型文档](configs/det/dbnet/README_CN.md)
6567

6668

67-
### 训练文本识别模型
69+
### 文本识别
6870

6971
MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**模型和**LMDB**数据集进行演示。请参考[CRNN模型文档](configs/rec/crnn/README_CN.md)
7072

@@ -73,7 +75,9 @@ MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**
7375

7476
#### 使用MX Engine推理
7577

76-
请参考[mx_infer](docs/cn/inference_cn.md)
78+
MX ([MindX](https://www.hiascend.com/zh/software/mindx-sdk)的缩写) 是一个支持昇腾设备高效推理与部署的工具。
79+
80+
MindOCR集成了MX推理引擎,支持文本检测识别任务,请参考[mx_infer](docs/cn/inference_cn.md).
7781

7882
#### 使用Lite推理
7983

@@ -83,30 +87,30 @@ MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**
8387

8488
敬请期待
8589

86-
## 支持模型及性能
90+
## 模型列表
8791

88-
### 文本检测
92+
<details open>
93+
<summary>文本检测</summary>
8994

90-
下表是目前支持的文本检测模型和它们在ICDAR2015测试数据集上的精度数据:
95+
- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020)
96+
- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
97+
- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [开发中]
9198

92-
| **模型** | **骨干网络** | **预训练** | **Recall** | **Precision** | **F-score** | **配置文件** |
93-
|-----------|--------------|----------------|------------|---------------|-------------|-----------------------------------------------------|
94-
| DBNet | ResNet-50 | ImageNet | 81.97% | 86.05% | 83.96% | [YAML](configs/det/dbnet/dbnet/db_r50_icdar15.yaml) |
95-
| DBNet++ | ResNet-50 | ImageNet | 82.02% | 87.38% | 84.62% | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml) |
99+
</details>
96100

97-
### 文本识别
101+
<details open>
102+
<summary>文本识别</summary>
98103

99-
下表是目前支持的文本识别模型和它们在公开测评数据集 (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) 上的精度数据:
104+
- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
105+
- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [开发中]
106+
- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [仅推理]
100107

101108

102-
| **模型** | **骨干网络** | **平均准确率**| **配置文件** |
103-
|-----------|--------------|----------------|------------|
104-
| CRNN | VGG7 | 82.03% | [YAML](configs/rec/crnn/crnn_vgg7.yaml) |
105-
| CRNN | Resnet34_vd | 84.45% | [YAML](configs/rec/crnn/crnn_resnet34.yaml) |
109+
模型训练的配置及性能结果请见[configs](./configs).
106110

107-
模型配置及性能详细介绍请见[configs](./configs).
111+
基于MX Engine推理的模型性能结果请见[mx inference performance](docs/cn/inference_models_cn.md)
108112

109-
## 注释
113+
## 重要信息
110114

111115
### 变更日志
112116
- 2023/03/23

0 commit comments

Comments
 (0)