Merge pull request #165 from SamitHuang/docs

SamitHuang · web-flow · commit 9ad07713fbd1 · 2023-04-04T01:56:58.000+08:00
rearrange and update reamde
diff --git a/README.md b/README.md
@@ -15,22 +15,22 @@ English | [中文](README_CN.md)
 [Introduction](#introduction) |
 [Installation](#installation) |
 [Quick Start](#quick-start) |
-[Model List](#supported-models-and-performance) |
+[Model List](#model-list) |
 [Notes](#notes)
 
 </div>
 
 
 ## Introduction
-MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.
+MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.
 
 
 <details open>
 <summary> Major Features </summary>
 
-- **Modulation design**: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
+- **Modulation design**: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
 - **High-performance**: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
-- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recogintion tasks. 
+- **Low-cost-to-apply**: We provide easy-to-use inference tools to perform text detection and recognition tasks. 
 </details>
 
 
@@ -43,7 +43,7 @@ To install the dependency, please run
 pip install -r requirements.txt
 ```
 
-Additionally, please install MindSpore(>=1.9) following the official [instructions](https://www.mindspore.cn/install) for the best fit of your machine. 
+Additionally, please install MindSpore(>=1.9) following the official [installation instructions](https://www.mindspore.cn/install) for the best fit of your machine. 
 
 For distributed training, please install [openmpi 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/).
 
@@ -63,61 +63,68 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
 
 ## Quick Start
 
-### Text Detection Model Training
+### 1. Model Training and Evaluation
 
-We will use **DBNet** model and **ICDAR2015** dataset for demonstration, although other models and datasets are also supported. Please refer to [DBNet model README](configs/det/dbnet/README.md).
+#### 1.1 Text Detection
 
+We will take **DBNet** model and **ICDAR2015** dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.
 
-### Text Recognition Model Training
+Please refer to [DBNet readme](configs/det/dbnet/README.md#3-quick-start) for detailed instructions.
 
-We will use **CRNN** model and **LMDB** dataset for demonstration, although other models and datasets are also supported. Please refer to [CRNN model README](configs/rec/crnn/README.md).
 
+#### 1.2 Text Recognition 
 
-### Inference and Deployment
+We will take **CRNN** model and **LMDB** dataset as an illustration on how to configure and launch the training process easily. 
 
-#### Inference with MX Engine
+Detailed instructions can be viewed in [CRNN readme](configs/rec/crnn/README.md#3-quick-start).
 
-Please refer to [mx_infer tutorial](docs/cn/inference_tutorial_cn.md) for detailed inference tutorial.
+**Note:**
+The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with `python tools/train.py -c /path/to/yaml_config`.
 
-Please refer to [mx_infer results](docs/cn/inference_models_cn.md) for detailed performance of the supported inference models.
+### 2. Inference and Deployment
 
-#### Inference with Lite 
+#### 2.1 Inference with MX Engine
 
-Coming soon
+MX, which is short for [MindX](https://www.hiascend.com/zh/software/mindx-sdk), allows efficient model inference and deployment on Ascend devices. 
+
+MindOCR supports OCR model inference with MX Engine. Please refer to [mx_infer](docs/cn/inference_cn.md) for detailed illustrations.
 
-#### Inference with native MindSpore
+#### 2.2 Inference with MS Lite 
 
 Coming soon
 
-## Supported Models and Performance
+#### 2.3 Inference with native MindSpore
 
-### Text Detection  
+Coming soon
 
-The supported detection  models and their performance on the test set of ICDAR2015 are as follow.
+## Model List
 
-| **Model** | **Backbone** | **Pretrained** | **Recall** | **Precision** | **F-score** | **Config**                                        | 
-|-----------|--------------|----------------|------------|---------------|-------------|---------------------------------------------------|
-| DBNet     | ResNet-50    | ImageNet       | 81.97%     | 86.05%        | 83.96%      | [YAML](configs/det/dbnet/db_r50_icdar15.yaml)     | 
-| DBNet++   | ResNet-50    | ImageNet       | 82.02%     | 87.38%        | 84.62%      | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml) |
+<details open>
+<summary>Text Detection</summary>
 
-### Text Recognition
+- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020) 
+- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
+- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [dev]
 
-The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow
+</details>
+
+<details open>
+<summary>Text Recognition</summary>
 
+- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
+- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [dev]
+- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [infer only]
 
-| **Model** | **Backbone** | **Avg Acc**| **Config** | 
-|-----------|--------------|----------------|------------|
-| CRNN     | VGG7        | 82.03% 	| [YAML](configs/rec/crnn/crnn_vgg7.yaml)    | 
-| CRNN     | Resnet34_vd    | 84.45% 	| [YAML](configs/rec/crnn/crnn_resnet34.yaml)     |
 
+For the detailed performance of the trained models, please refer to [configs](./configs).
 
-For more details, please refer to [configs](./configs).
+For detailed inference performance using MX engine, please refer to [mx inference performance](docs/cn/inference_models_cn.md) 
 
 ## Notes
 
 ### Change Log
 - 2023/03/23
-1. Add dynamic loss scaler support, compatiable with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A yaml example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
+1. Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set `type` of `loss_scale` as `dynamic`. A YAML example can be viewed in `configs/rec/crnn/crnn_icdar15.yaml`
 
 - 2023/03/20
 1. Arg names changed: `output_keys` -> `output_columns`, `num_keys_to_net` -> `num_columns_to_net`
@@ -141,7 +148,7 @@ For more details, please refer to [configs](./configs).
 
 ### How to Contribute
 
-We appreciate all kind of contributions including issues and PRs to make MindOCR better.
+We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
 
 Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for the contributing guideline. Please follow the [Model Template and Guideline](mindocr/models/README.md) for contributing a model that fits the overall interface :)
 
diff --git a/README_CN.md b/README_CN.md
@@ -1,19 +1,23 @@
+<div align="center">
 
 # MindOCR
 
-<!--
+[![CI](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml/badge.svg)](https://github.com/mindspore-lab/mindocr/actions/workflows/ci.yml)
 [![license](https://img.shields.io/github/license/mindspore-lab/mindocr.svg)](https://github.com/mindspore-lab/mindocr/blob/main/LICENSE)
 [![open issues](https://img.shields.io/github/issues/mindspore-lab/mindocr)](https://github.com/mindspore-lab/mindocr/issues)
 [![PRs](https://img.shields.io/badge/PRs-welcome-pink.svg)](https://github.com/mindspore-lab/mindocr/pulls)
- -->
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+
+
 [English](README.md) | 中文
 
-[概述](#introduction) |
-[安装](#installation) |
-[快速上手](#quick-start) |
-[模型列表](#supported-models-and-performance) |
-[注释](#notes)
+[概述](#概述) |
+[安装](#安装) |
+[快速上手](#快速上手) |
+[模型列表](#模型列表) |
+[重要信息](#重要信息)
 
+</div>
 
 ## 概述
 MindOCR是一个基于[MindSpore](https://www.mindspore.cn/en)框架的OCR开发及应用的开源工具箱，可以帮助用户训练、应用业界最有优的文本检测、文本识别模型，例如DBNet/DBNet++和CRNN/SVTR，以实现图像文本理解的需求。
@@ -55,12 +59,14 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
 
 ## 快速上手
 
-### 训练文本检测模型
+### 模型训练评估
+
+#### 文本检测
 
 MindOCR支持多种文本检测模型及数据集，在此我们使用**DBNet**模型和**ICDAR2015**数据集进行演示。请参考[DBNet模型文档](configs/det/dbnet/README_CN.md)。
 
 
-### 训练文本识别模型
+### 文本识别
 
 MindOCR支持多种文本识别模型及数据集，在此我们使用**CRNN**模型和**LMDB**数据集进行演示。请参考[CRNN模型文档](configs/rec/crnn/README_CN.md)。
 
@@ -69,9 +75,10 @@ MindOCR支持多种文本识别模型及数据集，在此我们使用**CRNN**
 
 #### 使用MX Engine推理
 
-教程请参考[mx_infer](docs/cn/inference_tutorial_cn.md)
+MX ([MindX](https://www.hiascend.com/zh/software/mindx-sdk)的缩写) 是一个支持昇腾设备高效推理与部署的工具。
+
+MindOCR集成了MX推理引擎，支持文本检测识别任务，请参考[mx_infer](docs/cn/inference_cn.md).
 
-模型列表和Benchmark请参考 [mx_infer](docs/cn/inference_models_cn.md)
 
 #### 使用Lite推理 
 
@@ -81,31 +88,36 @@ MindOCR支持多种文本识别模型及数据集，在此我们使用**CRNN**
 
 敬请期待
 
-## 支持模型及性能
+## 模型列表
 
-### 文本检测  
+<details open>
+<summary>文本检测</summary>
 
-下表是目前支持的文本检测模型和它们在ICDAR2015测试数据集上的精度数据：
+- [x] [DBNet](https://arxiv.org/abs/1911.08947) (AAAI'2020) 
+- [x] [DBNet++](https://arxiv.org/abs/2202.10304) (TPAMI'2022)
+- [ ] [FCENet](https://arxiv.org/abs/2104.10442) (CVPR'2021) [开发中]
 
-| **模型**  | **骨干网络**  | **预训练**      | **Recall** | **Precision** | **F-score** | **配置文件**                                            | 
-|-----------|--------------|----------------|------------|---------------|-------------|-----------------------------------------------------|
-| DBNet     | ResNet-50    | ImageNet       | 81.97%     | 86.05%        | 83.96%      | [YAML](configs/det/dbnet/dbnet/db_r50_icdar15.yaml) | 
-| DBNet++   | ResNet-50    | ImageNet       | 82.02%     | 87.38%        | 84.62%      | [YAML](configs/det/dbnet++/db++_r50_icdar15.yaml)   |
+</details>
 
-### 文本识别
+<details open>
+<summary>文本识别</summary>
 
-下表是目前支持的文本识别模型和它们在公开测评数据集 (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) 上的精度数据：
+- [x] [CRNN](https://arxiv.org/abs/1507.05717) (TPAMI'2016)
+- [ ] [ABINet](https://arxiv.org/abs/2103.06495) (CVPR'2021) [开发中]
+- [ ] [SVTR](https://arxiv.org/abs/2205.00159) (IJCAI'2022) [仅推理]
 
 
-| **模型** | **骨干网络** | **平均准确率**| **配置文件** | 
-|-----------|--------------|----------------|------------|
-| CRNN     | VGG7        | 82.03% 	| [YAML](configs/rec/crnn/crnn_vgg7.yaml)    | 
-| CRNN     | Resnet34_vd    | 84.45% 	| [YAML](configs/rec/crnn/crnn_resnet34.yaml)     |
+模型训练的配置及性能结果请见[configs](./configs).
 
+基于MX引擎的推理性能结果及支持模型列表，请见[mx inference performance](docs/cn/inference_models_cn.md) 
 
-## 注释
+## 重要信息
 
 ### 变更日志
+- 2023/03/23
+1. 增加dynamic loss scaler支持, 且与drop overflow update兼容。如需使用, 请在配置文件中增加`loss_scale`字段并将`type`参数设为`dynamic`，参考例子请见`configs/rec/crnn/crnn_icdar15.yaml`
+
+
 - 2023/03/20
 1. 参数名修改：`output_keys` -> `output_columns`；`num_keys_to_net` -> `num_columns_to_net`；
 2. 更新数据流程。