mindspore-lab
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 4 additions & 4 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 140 deletions b/‎README.md‎
Lines changed: 3 additions & 140 deletions
diff --git a/‎README_CN.md‎
Lines changed: 2 additions & 147 deletions b/‎README_CN.md‎
Lines changed: 2 additions & 147 deletions
@@ -36,10 +36,10 @@ jobs:
         pip install pytest
         # MindSpore must be installed following the instruction from official web, but not from pypi.
         # That's why we exclude mindspore from requirements.txt. Does this work?
-        pip install "mindspore>=1.8,<=1.10"
-    #- name: Test with pytest (UT)
-    #  run: |
-    #    pytest tests/modules/*.py
+        pip install "mindspore>=1.9,<=1.10"
+    - name: Test with pytest (UT)
+      run: |
+        pytest tests/ut/*.py
     - name: Test with pytest (ST)
       run: |
         pytest tests/st/test_train_eval_dummy.py
@@ -65,156 +65,19 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
 
 ### Text Detection Model Training
 
-We will use **DBNet** model and **ICDAR2015** dataset for illustration, although other models and datasets are also supported. <!--ICDAR15 is a commonly-used model and a benchmark for scene text recognition.-->
-
-#### 1. Data Preparation
-
-Please download the ICDAR2015 dataset from this [website](https://rrc.cvc.uab.es/?ch=4&com=downloads), then format the dataset annotation refer to [dataset_convert](tools/dataset_converters/README.md).
-
-After preparation, the data structure should be like 
-
-``` text
-.
-├── test
-│   ├── images
-│   │   ├── img_1.jpg
-│   │   ├── img_2.jpg
-│   │   └── ...
-│   └── det_gt.txt
-└── train
-    ├── images
-    │   ├── img_1.jpg
-    │   ├── img_2.jpg
-    │   └── ....jpg
-    └── det_gt.txt
-```
-
-#### 2. Configure Yaml
-
-Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from `configs/det`. Here we choose `configs/det/dbnet/db_r50_icdar15.yaml`.
-
-And change the data config args according to 
-``` yaml
-train:
-  dataset:
-    data_dir: PATH/TO/TRAIN_IMAGES_DIR
-    label_file: PATH/TO/TRAIN_LABELS.txt
-eval:
-  dataset:
-    data_dir: PATH/TO/TEST_IMAGES_DIR
-    label_file: PATH/TO/TEST_LABELS.txt
-```
-
-Optionally, change `num_workers` according to the cores of CPU, and change `distribute` to True if you are to train in distributed mode.
-
-#### 3. Training
-
-To train the model, please run 
-
-``` shell 
-# train dbnet on ic15 dataset
-python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
-```
-
-The training result (including checkpoints, per-epoch performance and curves) will be  saved in the directory parsed by the arg `ckpt_save_dir`.
-
-#### 4. Evaluation
-
-To evaluate, please set the checkpoint path to the arg `ckpt_load_path` in yaml config file and run 
-
-``` shell
-python tools/eval.py --config configs/det/dbnet/db_r50_icdar15.yaml
-```
+We will use **DBNet** model and **ICDAR2015** dataset for demonstration, although other models and datasets are also supported. Please refer to [DBNet model README](configs/det/dbnet/README.md).
 
 
 ### Text Recognition Model Training
 
-We will use **CRNN** model and **LMDB** dataset for illustration, although other models and datasets are also supported. 
-
-#### 1. Data Preparation
-
-Please download the LMDB dataset from [here](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0) (ref: [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)).
-
-There're several .zip data files:
-- `data_lmdb_release.zip` contains the entire datasets including train, valid and evaluation.
-- `validation.zip` is the union dataset for Validation
-- `evaluation.zip` contains several benchmarking datasets. 
-
-Unzip the data and after preparation, the data structure should be like 
-
-``` text
-.
-├── train
-│   ├── MJ
-│   │   ├── data.mdb
-│   │   ├── lock.mdb
-│   ├── ST
-│   │   ├── data.mdb
-│   │   ├── lock.mdb
-└── validation
-|   ├── data.mdb
-|   ├── lock.mdb
-└── evaluation
-    ├── IC03
-    │   ├── data.mdb
-    │   ├── lock.mdb
-    ├── IC13
-    │   ├── data.mdb
-    │   ├── lock.mdb
-    └── ...
-```
-
-#### 2. Configure Yaml
-
-Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from `configs/rec`. Here we choose `configs/rec/crnn/crnn_resnet34.yaml`.
+We will use **CRNN** model and **LMDB** dataset for demonstration, although other models and datasets are also supported. Please refer to [CRNN model README](configs/rec/crnn/README.md).
 
-Please change the data config args accordingly, such as
-``` yaml
-train:
-  dataset:
-    type: LMDBDataset
-    data_dir: lmdb_data/rec/train/
-eval:
-  dataset:
-    type: LMDBDataset
-    data_dir: lmdb_data/rec/validation/
-```
-
-Optionally, change `num_workers` according to the cores of CPU, and change `distribute` to True if you are to train in distributed mode.
-
-#### 3. Training
-
-We will use distributed training for the large LMDB dataset. 
-
-To train in distributed mode, please run
-
-```shell
-# Distributed training on Ascends
-mpirun --allow-run-as-root -n 8 python tools/train.py  --config configs/rec/crnn/crnn_resnet34.yaml
-```
-
-```shell
-# n is the number of GPUs/NPUs
-mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
-```
-> Notes: please ensure the arg `distribute` in yaml file is set True
-
-
-The training result (including checkpoints, per-epoch performance and curves) will be  saved in the directory parsed by the arg `ckpt_save_dir`. 
-
-#### 4. Evaluation
-
-To evaluate, please set the checkpoint path to the arg `ckpt_load_path` in yaml config file and run 
-
-``` shell
-python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml
-```
 
 ### Inference and Deployment
 
 #### Inference with MX Engine
 
-Please refer to [mx_infer](docs/cn/inference_cn.md)
+Please refer to [mx_infer](docs/cn/inference_cn.md).
 
 #### Inference with Lite 
 
 
@@ -61,158 +61,13 @@ pip install git+https://github.com/mindspore-lab/mindocr.git
 
 ### 训练文本检测模型
 
-MindOCR支持多种文本检测模型及数据集，在此我们使用**DBNet** 模型和 **ICDAR2015**数据集进行演示。 
-
-#### 1. 数据准备
-
-请从[该网址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载ICDAR2015数据集，然后参考[数据转换](tools/dataset_converters/README_CN.md)对数据集标注进行格式化。
-
-完成数据准备工作后，数据的目录结构应该如下所示： 
-
-``` text
-.
-├── test
-│   ├── images
-│   │   ├── img_1.jpg
-│   │   ├── img_2.jpg
-│   │   └── ...
-│   └── det_gt.txt
-└── train
-    ├── images
-    │   ├── img_1.jpg
-    │   ├── img_2.jpg
-    │   └── ....jpg
-    └── det_gt.txt
-```
-
-#### 2. 配置Yaml文件
-
-在`configs/det`中选择一个包含目标预训练模型和数据流程的yaml配置文件，这里我们选择`configs/det/dbnet/db_r50_icdar15.yaml`。
-
-然后，按照以下指引更改数据配置参数：
-``` yaml
-train:
-  dataset:
-    data_dir: PATH/TO/TRAIN_IMAGES_DIR
-    label_file: PATH/TO/TRAIN_LABELS.txt
-eval:
-  dataset:
-    data_dir: PATH/TO/TEST_IMAGES_DIR
-    label_file: PATH/TO/TEST_LABELS.txt
-```
-
-【可选】可以根据CPU核的数量设置`num_workers`参数的值；如果需要在分布式模式下训练，可修改`distribute`为True。
-
-#### 3. 训练
-
-运行以下命令开始模型训练：
-
-``` shell 
-# train dbnet on ic15 dataset
-python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
-```
-
-如果在分布式模式下，请运行命令：
-
-```shell
-# n is the number of GPUs/NPUs
-mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
-```
-> 注意：请确保yaml文件中的`distribute`参数为True。
-
-
-训练结果 (包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下，默认为 "./tmp_det/"。 
-
-#### 4. 评估
-
-评估环节，在yaml配置文件中将`ckpt_load_path`参数配置为checkpoint文件的路径，然后运行： 
-
-``` shell
-python tools/eval.py --config configs/det/dbnet/db_r50_icdar15.yaml
-```
+MindOCR支持多种文本检测模型及数据集，在此我们使用**DBNet**模型和**ICDAR2015**数据集进行演示。请参考[DBNet模型文档](configs/det/dbnet/README_CN.md)。
 
 
 ### 训练文本识别模型
 
-MindOCR支持多种文本识别模型及数据集，在此我们使用**CRNN** 模型和 **LMDB**数据集进行演示。
-
-#### 1. 数据准备
-
-参考[deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，从[这里](https://www.dropbox.com/sh/i39abvnefllx2si/AAAbAYRvxzRp3cIE5HzqUw3ra?dl=0)下载LMDB数据集。
-
-一共有如下.zip压缩数据文件：
-- `data_lmdb_release.zip` 包含训练、验证及测试的全部数据；
-- `validation.zip` 是验证数据集的合集；
-- `evaluation.zip` 包含多个评估数据集。 
-
-解压文件并完成数据准备操作后，数据文件夹结构如下：
-
-``` text
-.
-├── train
-│   ├── MJ
-│   │   ├── data.mdb
-│   │   ├── lock.mdb
-│   ├── ST
-│   │   ├── data.mdb
-│   │   ├── lock.mdb
-└── validation
-|   ├── data.mdb
-|   ├── lock.mdb
-└── evaluation
-    ├── IC03
-    │   ├── data.mdb
-    │   ├── lock.mdb
-    ├── IC13
-    │   ├── data.mdb
-    │   ├── lock.mdb
-    └── ...
-```
-
-#### 2. 配置Yaml文件
-
-在`configs/rec`中选择一个包含目标预训练模型和数据流程的yaml配置文件，这里我们选择`configs/rec/crnn/crnn_resnet34.yaml`。
-
-相应的更改数据配置参数：
-``` yaml
-train:
-  dataset:
-    type: LMDBDataset
-    data_dir: lmdb_data/rec/train/
-eval:
-  dataset:
-    type: LMDBDataset
-    data_dir: lmdb_data/rec/validation/
-```
-【可选】可以根据CPU核的数量设置`num_workers`参数的值；如果需要在分布式模式下训练，可修改`distribute`为True。
-
-#### 3. 训练
-
-运行以下命令开始模型训练： 
-
-``` shell 
-# train crnn on MJ+ST dataset
-python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
-```
+MindOCR支持多种文本识别模型及数据集，在此我们使用**CRNN**模型和**LMDB**数据集进行演示。请参考[CRNN模型文档](configs/rec/crnn/README_CN.md)。
 
-如果在分布式模式下，请运行命令：
-
-```shell
-# n is the number of GPUs/NPUs
-mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
-```
-> 注意：请确保yaml文件中的`distribute`参数为True。
-
-
-训练结果 (包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的路径下，默认为 "./tmp_rec/"。 
-
-#### 4. 评估
-
-评估环节，在yaml配置文件中将`ckpt_load_path`参数配置为checkpoint文件的路径，然后运行： 
-
-``` shell
-python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml
-```
 
 ### 推理与部署