Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Commit 2b84085

Browse files
bfineranjeanniefinksmarkurtz
authored
M-FAC documentation and Tutorials (#358)
* M-FAC documentation and Tutorials * applying suggestions from code review Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> Co-authored-by: Mark Kurtz <mark@neuralmagic.com>
1 parent f39f04b commit 2b84085

7 files changed

+659
-0
lines changed

research/mfac/README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# Matrix-Free Approximate Curvature (M-FAC) Pruning
18+
19+
The paper
20+
[Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization](https://arxiv.org/pdf/2107.03356.pdf)
21+
written by Elias Frantar, Eldar Kurtic, and Assistant Professor Dan Alistarh of IST Austria
22+
introduces the Matrix-Free Approximate Curvature (M-FAC) method of pruning.
23+
M-FAC builds on advances from the [WoodFisher](https://arxiv.org/pdf/2004.14340.pdf)
24+
pruning paper to efficiently use first-order information (gradients) to determine optimal weights
25+
to prune by approximating the corresponding second-order information.
26+
This algorithm is shown to outperform magnitude pruning as well as other second-order pruning
27+
techniques on a variety of one-shot and gradual pruning tasks.
28+
29+
## Using M-FAC with SparseML
30+
31+
SparseML makes it easy to use the M-FAC pruning algorithm as part of sparsification
32+
recipes to improve pruning recovery by providing an `MFACPruningModifier`.
33+
The `MFACPruningModifier` contains the same settings as the magnitude
34+
pruning modifiers and contains extra settings for the M-FAC algorithm under the
35+
`mfac_options` parameter. `mfac_options` should be provided as a YAML dictionary and
36+
details of the main options are provided below.
37+
38+
### Example M-FAC Recipe
39+
The following is an example `MFACPruningModifier` to be used in place of other
40+
pruning modifiers in a recipe:
41+
42+
```yaml
43+
pruning_modifiers:
44+
- !MFACPruningModifier
45+
params: __ALL_PRUNABLE__
46+
init_sparsity: 0.05
47+
final_sparsity: 0.85
48+
start_epoch: 1.0
49+
end_epoch: 61.0
50+
update_frequency: 4.0
51+
mfac_options:
52+
num_grads: {0.0: 256, 0.5: 512, 0.75: 1024, 0.83: 1400}
53+
fisher_block_size: 10000
54+
available_gpus: ["cuda:0"]
55+
```
56+
57+
### mfac_options Parameters
58+
The following parameters can be specified under the `mfac_options` parameter to control
59+
how the M-FAC calculations are made. Ideal values will depend on the system
60+
available to run on and model to be pruned.
61+
62+
#### num_grads
63+
To approximate the second order information in the M-FAC algorithm, first order
64+
gradients are used. `num_grads` specifies the number of recent gradient samples to store
65+
of a model while training.
66+
67+
This value can be an int where that constant value will be used throughout pruning.
68+
Alternatively the value can be a dictionary of float sparsity values to the number of
69+
gradients that should be stored when that sparsity level (between 0.0 and 1.0) is reached.
70+
If a dictionary is used, then 0.0 must be included as a key for the base number of gradients
71+
to store (i.e. {0: 64, 0.5: 128, 0.75: 256}).
72+
73+
Storing gradients can be expensive, as for a dense model, each additional gradient
74+
sample stored requires about the same memory that the entire model needs. This is why
75+
the dictionary option allows for more gradients to be stored as the model gets more
76+
sparse.
77+
78+
If a M-FAC pruning run is unexpectedly killed, the reason could likely be that
79+
the gradient storage requirements exceeded the system's RAM. A safe rule of thumb for
80+
initial number of gradients is the number should be no greater than 1/4 of the
81+
available CPU RAM divided by the model size.
82+
83+
84+
#### fisher_block_size
85+
To limit the computational cost of calculating second order information, the M-FAC
86+
algorithm may compute a block diagonal matrix of a certain block size that is
87+
sufficient for generating the necessary information for pruning.
88+
89+
The `fisher_block_size` specifies this block size. If using GPUs to perform the
90+
M-FAC computations, the GPUs should have `num_grads * fisher_block_size` extra
91+
memory during training so each block can be stored and computed sequentially on a GPU.
92+
93+
The default block size is 2000, and generally block sizes between 1000 and 10000 may be
94+
ideal. If `None` is provided, the full matrix will be computed without blocks.
95+
96+
97+
#### available_gpus
98+
`available_gpus` is a list of GPU devices names to perform the WoodFisher computation
99+
with. If not provided, computation will be done on the CPU.
100+
101+
102+
## Tutorials
103+
104+
Tutorials for using M-FAC with SparseML are provided in the [tutorials](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials)
105+
directory. Currently there are tutorials available for
106+
[one-shot](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/one_shot_pruning_with_mfac.md)
107+
and [gradual](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/gradual_pruning_with_mfac.md)
108+
pruning with M-FAC.
109+
110+
## Need Help?
111+
For Neural Magic Support, sign up or log in to get help with your questions in our
112+
**Tutorials channel:** [Discourse Forum](https://discuss.neuralmagic.com/)
113+
and/or [Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ).
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
---
18+
pruning_modifiers:
19+
- !GMPruningModifier
20+
params: __ALL_PRUNABLE__
21+
init_sparsity: 0.0
22+
final_sparsity: 0.35
23+
start_epoch: 0.0
24+
end_epoch: 1.0
25+
update_frequency: 1.0
26+
---
27+
28+
# Pruning MNISTNet with Magnitude
29+
This recipe prunes a model to 35% sparsity using magnitude pruning.
30+
This recipe is intended for use with one-shot pruning and is for example
31+
purposes only.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
---
18+
pruning_modifiers:
19+
- !MFACPruningModifier
20+
params: __ALL_PRUNABLE__
21+
init_sparsity: 0.0
22+
final_sparsity: 0.35
23+
start_epoch: 0.0
24+
end_epoch: 1.0
25+
update_frequency: 1.0
26+
mfac_options:
27+
num_grads: 512
28+
fisher_block_size: 2000
29+
---
30+
31+
# Pruning MNISTNet with M-FAC
32+
This recipe prunes a model to 35% sparsity using the M-FAC pruning algorithm.
33+
It is intended for use with MNISTNet but could be used to prune other models
34+
in one shot, however the `final_sparsity` and `mfac_options` should be adjusted
35+
accordingly.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
---
18+
# Epoch Variables
19+
start_epoch: &start_epoch 0.0
20+
end_epoch: &end_epoch 3.0
21+
pruning_start_epoch: &pruning_start_epoch 1.0
22+
pruning_end_epoch: &pruning_end_epoch 2.0
23+
lr: &lr 0.0004
24+
25+
# Pruning Variables
26+
pruning_update_frequency: &pruning_update_frequency 1.0
27+
pruning_mask_type: &pruning_mask_type unstructured
28+
29+
target_sparsities: &target_sparsities
30+
0.90: ['sections.1.0.point.conv.weight', 'sections.1.1.point.conv.weight', 'sections.2.0.point.conv.weight']
31+
0.95: ['sections.2.1.point.conv.weight', 'sections.3.0.point.conv.weight', 'sections.3.1.point.conv.weight', 'sections.3.5.point.conv.weight']
32+
0.97: ['sections.3.2.point.conv.weight', 'sections.3.3.point.conv.weight', 'sections.3.4.point.conv.weight', 'sections.4.0.point.conv.weight', 'sections.4.1.point.conv.weight']
33+
34+
# Modifiers Groups:
35+
training_modifiers:
36+
- !EpochRangeModifier
37+
start_epoch: *start_epoch
38+
end_epoch: *end_epoch
39+
40+
- !SetLearningRateModifier
41+
start_epoch: *start_epoch
42+
learning_rate: *lr
43+
44+
pruning_modifiers:
45+
- !GMPruningModifier
46+
params: []
47+
init_sparsity: 0.35
48+
final_sparsity: *target_sparsities
49+
start_epoch: *pruning_start_epoch
50+
end_epoch: *pruning_end_epoch
51+
update_frequency: *pruning_update_frequency
52+
mask_type: *pruning_mask_type
53+
54+
- !SetWeightDecayModifier
55+
weight_decay: 0.0
56+
start_epoch: *pruning_end_epoch
57+
---
58+
59+
# Pruning MobileNet-Imagenette Magnitude
60+
This recipe prunes a MobileNet model to 95% sparsity over 3 epochs using magnitude pruning.
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
---
18+
# Epoch Variables
19+
start_epoch: &start_epoch 0.0
20+
end_epoch: &end_epoch 3.0
21+
pruning_start_epoch: &pruning_start_epoch 1.0
22+
pruning_end_epoch: &pruning_end_epoch 2.0
23+
lr: &lr 0.0004
24+
25+
# Pruning Variables
26+
pruning_update_frequency: &pruning_update_frequency 1.0
27+
pruning_mask_type: &pruning_mask_type unstructured
28+
29+
target_sparsities: &target_sparsities
30+
0.90: ['sections.1.0.point.conv.weight', 'sections.1.1.point.conv.weight', 'sections.2.0.point.conv.weight']
31+
0.95: ['sections.2.1.point.conv.weight', 'sections.3.0.point.conv.weight', 'sections.3.1.point.conv.weight', 'sections.3.5.point.conv.weight']
32+
0.97: ['sections.3.2.point.conv.weight', 'sections.3.3.point.conv.weight', 'sections.3.4.point.conv.weight', 'sections.4.0.point.conv.weight', 'sections.4.1.point.conv.weight']
33+
34+
# Modifiers Groups:
35+
training_modifiers:
36+
- !EpochRangeModifier
37+
start_epoch: *start_epoch
38+
end_epoch: *end_epoch
39+
40+
- !SetLearningRateModifier
41+
start_epoch: *start_epoch
42+
learning_rate: *lr
43+
44+
pruning_modifiers:
45+
- !MFACPruningModifier
46+
params: []
47+
init_sparsity: 0.35
48+
final_sparsity: *target_sparsities
49+
start_epoch: *pruning_start_epoch
50+
end_epoch: *pruning_end_epoch
51+
update_frequency: *pruning_update_frequency
52+
mask_type: *pruning_mask_type
53+
mfac_options:
54+
num_grads: 256
55+
fisher_block_size: 2000
56+
available_gpus: ["cuda:0"]
57+
58+
- !SetWeightDecayModifier
59+
weight_decay: 0.0
60+
start_epoch: *pruning_end_epoch
61+
---
62+
63+
# Pruning MobileNet-Imagenette M-FAC
64+
This recipe prunes a MobileNet model to 95% sparsity over 3 epochs using the M-FAC algorithm.
65+
66+
* This recipe must be run at train batch size 32. If running at a different batch size,
67+
the learning rate and number of M-FAC gradients should be adjusted accordingly
68+
* `available_gpus` should be updated based on the devices available on the system.
69+
If no GPU is available, then it should be removed
70+
* This recipe is for demonstration purposes only; in practice a larger dataset and longer pruning
71+
schedule should be used to obtain best results

0 commit comments

Comments
 (0)