Skip to content

Commit 9ad0771

Browse files
authored
Merge pull request #165 from SamitHuang/docs
rearrange and update reamde
2 parents e119858 + c4ad503 commit 9ad0771

File tree

2 files changed

+76
-57
lines changed

2 files changed

+76
-57
lines changed

README.md

Lines changed: 39 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,22 @@ English | [中文](README_CN.md)
1515
[Introduction](#introduction) |
1616
[Installation](#installation) |
1717
[Quick Start](#quick-start) |
18-
[Model List](#supported-models-and-performance) |
18+
[Model List](#model-list) |
1919
[Notes](#notes)
2020

2121
</div>
2222

2323

2424
## Introduction
25-
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.
25+
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.
2626

2727

2828
<details open>
2929
<summary> Major Features </summary>
3030

31-
- **Modulation design**: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
31+
- **Modulation design**: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
3232
- **High-performance**: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
33-
- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recogintion tasks.
33+
- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recognition tasks.
3434
</details>
3535

3636

@@ -43,7 +43,7 @@ To install the dependency, please run
4343
pip install -r requirements.txt
4444
```
4545

46-
Additionally, please install MindSpore(>=1.9) following the official [instructions](https://www.mindspore.cn/install) for the best fit of your machine.
46+
Additionally, please install MindSpore(>=1.9) following the official [installation instructions](https://www.mindspore.cn/install) for the best fit of your machine.
4747

4848
For distributed training, please install [openmpi 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/).
4949

@@ -63,61 +63,68 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
6363
6464
## Quick Start
6565

66-
### Text Detection Model Training
66+
### 1. Model Training and Evaluation
6767

68-
We will use **DBNet** model and **ICDAR2015** dataset for demonstration, although other models and datasets are also supported. Please refer to [DBNet model README](configs/det/dbnet/README.md).
68+
#### 1.1 Text Detection
6969

70+
We will take **DBNet** model and **ICDAR2015** dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.
7071

71-
### Text Recognition Model Training
72+
Please refer to [DBNet readme](configs/det/dbnet/README.md#3-quick-start) for detailed instructions.
7273

73-
We will use **CRNN** model and **LMDB** dataset for demonstration, although other models and datasets are also supported. Please refer to [CRNN model README](configs/rec/crnn/README.md).
7474

75+
#### 1.2 Text Recognition
7576

76-
### Inference and Deployment
77+
We will take **CRNN** model and **LMDB** dataset as an illustration on how to configure and launch the training process easily.
7778

78-
#### Inference with MX Engine
79+
Detailed instructions can be viewed in [CRNN readme](configs/rec/crnn/README.md#3-quick-start).
7980

80-
Please refer to [mx_infer tutorial](docs/cn/inference_tutorial_cn.md) for detailed inference tutorial.
81+
**Note:**
82+
The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with `python tools/train.py -c /path/to/yaml_config`.
8183

82-
Please refer to [mx_infer results](docs/cn/inference_models_cn.md) for detailed performance of the supported inference models.
84+
### 2. Inference and Deployment
8385

84-
#### Inference with Lite
86+
#### 2.1 Inference with MX Engine
8587

86-
Coming soon
88+
MX, which is short for [MindX](https://www.hiascend.com/zh/software/mindx-sdk), allows efficient model inference and deployment on Ascend devices.
89+
90+
MindOCR supports OCR model inference with MX Engine. Please refer to [mx_infer](docs/cn/inference_cn.md) for detailed illustrations.
8791

88-
#### Inference with native MindSpore
92+
#### 2.2 Inference with MS Lite
8993

9094
Coming soon
9195

92-
## Supported Models and Performance
96+
#### 2.3 Inference with native MindSpore
9397

94-
### Text Detection
98+
Coming soon
9599

96-
The supported detection models and their performance on the test set of ICDAR2015 are as follow.
100+
## Model List
97101

98-
| **Model** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Config** |
99-
|-----------|--------------|----------------|------------|---------------|-------------|---------------------------------------------------|
100-
| DBNet | ResNet-50 | ImageNet | 81.97% | 86.05% | 83.96% | [YAML](configs/det/dbnet/db_r50_icdar15.yaml) |
101-
| DBNet++ | ResNet-50 | ImageNet | 82.02% | 87.38% | 84.62% | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml) |
102+
<details open>
103+
<summary>Text Detection</summary>
102104

103-
### Text Recognition
105+
- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020)
106+
- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
107+
- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [dev]
104108

105-
The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow
109+
</details>
110+
111+
<details open>
112+
<summary>Text Recognition</summary>
106113

114+
- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
115+
- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [dev]
116+
- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [infer only]
107117

108-
| **Model** | **Backbone** | **Avg Acc**| **Config** |
109-
|-----------|--------------|----------------|------------|
110-
| CRNN | VGG7 | 82.03% | [YAML](configs/rec/crnn/crnn_vgg7.yaml) |
111-
| CRNN | Resnet34_vd | 84.45% | [YAML](configs/rec/crnn/crnn_resnet34.yaml) |
112118

119+
For the detailed performance of the trained models, please refer to [configs](./configs).
113120

114-
For more details, please refer to [configs](./configs).
121+
For detailed inference performance using MX engine, please refer to [mx inference performance](docs/cn/inference_models_cn.md)
115122

116123
## Notes
117124

118125
### Change Log
119126
- 2023/03/23
120-
1. Add dynamic loss scaler support, compatiable with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A yaml example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
127+
1. Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A YAML example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
121128

122129
- 2023/03/20
123130
1. Arg names changed: `output_keys` -> `output_columns`, `num_keys_to_net` -> `num_columns_to_net`
@@ -141,7 +148,7 @@ For more details, please refer to [configs](./configs).
141148

142149
### How to Contribute
143150

144-
We appreciate all kind of contributions including issues and PRs to make MindOCR better.
151+
We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
145152

146153
Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for the contributing guideline. Please follow the [Model Template and Guideline](mindocr/models/README.md) for contributing a model that fits the overall interface :)
147154

README_CN.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,23 @@
1+
<div align="center">
12

23
# MindOCR
34

4-
<!--
5+
[![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
56
[![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
67
[![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
78
[![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
8-
-->
9+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
10+
11+
912
[English](README.md) | 中文
1013

11-
[概述](#introduction) |
12-
[安装](#installation) |
13-
[快速上手](#quick-start) |
14-
[模型列表](#supported-models-and-performance) |
15-
[注释](#notes)
14+
[概述](#概述) |
15+
[安装](#安装) |
16+
[快速上手](#快速上手) |
17+
[模型列表](#模型列表) |
18+
[重要信息](#重要信息)
1619

20+
</div>
1721

1822
## 概述
1923
MindOCR是一个基于[MindSpore](https://www.mindspore.cn/en)框架的OCR开发及应用的开源工具箱,可以帮助用户训练、应用业界最有优的文本检测、文本识别模型,例如DBNet/DBNet++和CRNN/SVTR,以实现图像文本理解的需求。
@@ -55,12 +59,14 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
5559
5660
## 快速上手
5761

58-
### 训练文本检测模型
62+
### 模型训练评估
63+
64+
#### 文本检测
5965

6066
MindOCR支持多种文本检测模型及数据集,在此我们使用**DBNet**模型和**ICDAR2015**数据集进行演示。请参考[DBNet模型文档](configs/det/dbnet/README_CN.md)
6167

6268

63-
### 训练文本识别模型
69+
### 文本识别
6470

6571
MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**模型和**LMDB**数据集进行演示。请参考[CRNN模型文档](configs/rec/crnn/README_CN.md)
6672

@@ -69,9 +75,10 @@ MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**
6975

7076
#### 使用MX Engine推理
7177

72-
教程请参考[mx_infer](docs/cn/inference_tutorial_cn.md)
78+
MX ([MindX](https://www.hiascend.com/zh/software/mindx-sdk)的缩写) 是一个支持昇腾设备高效推理与部署的工具。
79+
80+
MindOCR集成了MX推理引擎,支持文本检测识别任务,请参考[mx_infer](docs/cn/inference_cn.md).
7381

74-
模型列表和Benchmark请参考 [mx_infer](docs/cn/inference_models_cn.md)
7582

7683
#### 使用Lite推理
7784

@@ -81,31 +88,36 @@ MindOCR支持多种文本识别模型及数据集,在此我们使用**CRNN**
8188

8289
敬请期待
8390

84-
## 支持模型及性能
91+
## 模型列表
8592

86-
### 文本检测
93+
<details open>
94+
<summary>文本检测</summary>
8795

88-
下表是目前支持的文本检测模型和它们在ICDAR2015测试数据集上的精度数据:
96+
- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020)
97+
- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
98+
- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [开发中]
8999

90-
| **模型** | **骨干网络** | **预训练** | **Recall** | **Precision** | **F-score** | **配置文件** |
91-
|-----------|--------------|----------------|------------|---------------|-------------|-----------------------------------------------------|
92-
| DBNet | ResNet-50 | ImageNet | 81.97% | 86.05% | 83.96% | [YAML](configs/det/dbnet/dbnet/db_r50_icdar15.yaml) |
93-
| DBNet++ | ResNet-50 | ImageNet | 82.02% | 87.38% | 84.62% | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml) |
100+
</details>
94101

95-
### 文本识别
102+
<details open>
103+
<summary>文本识别</summary>
96104

97-
下表是目前支持的文本识别模型和它们在公开测评数据集 (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) 上的精度数据:
105+
- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
106+
- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [开发中]
107+
- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [仅推理]
98108

99109

100-
| **模型** | **骨干网络** | **平均准确率**| **配置文件** |
101-
|-----------|--------------|----------------|------------|
102-
| CRNN | VGG7 | 82.03% | [YAML](configs/rec/crnn/crnn_vgg7.yaml) |
103-
| CRNN | Resnet34_vd | 84.45% | [YAML](configs/rec/crnn/crnn_resnet34.yaml) |
110+
模型训练的配置及性能结果请见[configs](./configs).
104111

112+
基于MX引擎的推理性能结果及支持模型列表,请见[mx inference performance](docs/cn/inference_models_cn.md)
105113

106-
## 注释
114+
## 重要信息
107115

108116
### 变更日志
117+
- 2023/03/23
118+
1. 增加dynamic loss scaler支持, 且与drop overflow update兼容。如需使用, 请在配置文件中增加`loss_scale`字段并将`type`参数设为`dynamic`,参考例子请见`configs/rec/crnn/crnn_icdar15.yaml`
119+
120+
109121
- 2023/03/20
110122
1. 参数名修改:`output_keys` -> `output_columns``num_keys_to_net` -> `num_columns_to_net`
111123
2. 更新数据流程。

0 commit comments

Comments
 (0)