Skip to content

Commit a61d4d3

Browse files
authored
add hot boot arg (PaddlePaddle#980)
1 parent 1a6729e commit a61d4d3

File tree

2 files changed

+33
-3
lines changed

2 files changed

+33
-3
lines changed

examples/model_compression/minilmv2/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" general_distill.py
3434
--student_model_type tinybert \
3535
--num_relation_heads 48 \
3636
--student_model_name_or_path tinybert-6l-768d-zh \
37+
--init_from_student False \
3738
--teacher_model_type bert \
3839
--teacher_model_name_or_path bert-base-chinese \
3940
--max_seq_length 128 \
@@ -51,6 +52,26 @@ python -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" general_distill.py
5152
--input_dir ${dataset} \
5253

5354
```
55+
56+
其中参数释义如下:
57+
58+
- `student_model_type` 学生模型的类型
59+
- `num_relation_heads` head重新组合之后的head数
60+
- `student_model_name_or_path` 学生模型的名字(需要与学生模型类型对应),或者是学生模型的路径
61+
- `init_from_student` 本次蒸馏的学生模型是否用`student_model_name_or_path`中的参数进行初始化,是个bool类型的参数。默认是False
62+
- `teacher_model_type bert` 教师模型的类型
63+
- `teacher_model_name_or_path` 教师模型的名字
64+
- `max_seq_length 128` 表示最大句子长度,超过该长度将被截断。
65+
- `warmup_steps` 学习率warmup up的步数
66+
- `save_steps` 保存模型的频率
67+
- `teacher_layer_index`表示学生模型从教师模型学习的教师层
68+
- `student_layer_index` 表示学生模型从教师模型学习的学生层
69+
- `output_dir` 模型输出的目录
70+
- `device gpu` 表示运行该程序的设备,默认是gpu
71+
- `input_dir` 预训练数据的存放地址
72+
73+
74+
5475
### 评价方法
5576

5677
假设预训练完成后的模型存储在`${pretrained_models}`下,这里也提供了我们已经预训练完成的一版[模型](https://paddlenlp.bj.bcebos.com/models/general_distill/minilmv2_6l_768d_ch.tar.gz)可供参考,模型与`tinybert-6l-768d-zh`结构相同,因此可以使用`TinyBertForSequenceClassification.from_pretrained()`对模型直接进行加载。
@@ -80,6 +101,7 @@ python -u ./run_clue.py \
80101

81102
```
82103

104+
83105
其中不同的任务下,`${learning_rate}``${num_train_epochs}``${max_seq_len}`,我们推荐不同的Fine-tuning的超参数,可以参考以下配置:
84106

85107
| TASK_NAME | AFQMC | TNEWS | IFLYTEK | OCNLI | CMNLI | CLUEWSC2020 | CSL |

examples/model_compression/minilmv2/general_distill.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import random
1818
import time
1919
from functools import partial
20+
import distutils.util
2021
from concurrent.futures import ThreadPoolExecutor
2122

2223
import numpy as np
@@ -66,6 +67,11 @@ def parse_args():
6667
list(classes[-1].pretrained_init_configuration.keys())
6768
for classes in MODEL_CLASSES.values()
6869
], [])), )
70+
parser.add_argument(
71+
"--init_from_student",
72+
type=distutils.util.strtobool,
73+
default=False,
74+
help="Whether to use the parameters of student model to initialize.")
6975
parser.add_argument(
7076
"--teacher_model_name_or_path",
7177
default=None,
@@ -85,7 +91,6 @@ def parse_args():
8591
required=True,
8692
help="The output directory where the model predictions and checkpoints will be written.",
8793
)
88-
8994
parser.add_argument(
9095
"--max_seq_length",
9196
default=128,
@@ -248,8 +253,11 @@ def do_train(args):
248253
# For student
249254
model_class, tokenizer_class = MODEL_CLASSES[args.student_model_type]
250255
tokenizer = tokenizer_class.from_pretrained(args.student_model_name_or_path)
251-
tinybert = TinyBertModel(vocab_size=21128, num_hidden_layers=6)
252-
student = model_class(tinybert)
256+
if args.init_from_student:
257+
student = model_class.from_pretrained(args.student_model_name_or_path)
258+
else:
259+
tinybert = TinyBertModel(vocab_size=21128, num_hidden_layers=6)
260+
student = model_class(tinybert)
253261

254262
# For teacher
255263
teacher_model_class, _ = MODEL_CLASSES[args.teacher_model_type]

0 commit comments

Comments
 (0)