You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 3, 2025. It is now read-only.
* adding distillation shell
* add WANDB for exps and add layer perf for README
* add line to remeber to remove spacemanidols's wandb stuff
* small implementations in model distillation
* cleaning up code because I realized the teacher model only needs to run once
* distilation code implemented
* cleaned up code for public usage. awaiting numbers
* remove unused variable
* adding recipes
* improve layer dropping
* improve layer dropping
* clean up layer dropping
* updating recipes
* Updaing results in README
* updating readme
* fixing epoch config issue and pushing recipe
* forgot to remvoe exit(0)
* minor update to distillation code to remove logging of epoch
* removing unneeded print
* removing wandb
* added REAME updates for model perf
Co-authored-by: Mark Kurtz <mark@neuralmagic.com>
In addition to a simple QA model we provide implementation which can leverage teacher-student distillation. The usage of the distillation code is virually identical to the non distilled model but commands are as follow.
To explore the effect of model pruning compared to layer dropping we train models to sparsity to match the amount of parameters in models with layers droppend. Results feature both with and without distillation. For distillation we use hard distillation and a a trained teacher model which is trained on SQUAD for 2 epochs and achieves an 88.32442/81.10690 F1/EM. A 9 layer model is roughly equivalent to 20% sparsity, 6 layer to 40%, 3 layer to 60%, 1 layer to 72%.
158
+
159
+
| base model name | sparsity | params |Distilled| prunned | layers |pruning epochs| F1 Score | EM Score |
## Script origin and how to integrate sparseml with other Transformers projects
97
187
This script is based on the example BERT-QA implementation in transformers found [here](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_qa.py).
0 commit comments