M-FAC documentation and Tutorials (#358)

bfineran · jeanniefinks · markurtz · web-flow · commit 2b840850e1b8 · 2021-08-25T17:58:48.000-04:00
* M-FAC documentation and Tutorials

* applying suggestions from code review

Co-authored-by: Jeannie Finks &lt;74554921+jeanniefinks@users.noreply.github.com&gt;

Co-authored-by: Jeannie Finks &lt;74554921+jeanniefinks@users.noreply.github.com&gt;
Co-authored-by: Mark Kurtz &lt;mark@neuralmagic.com&gt;
diff --git a/research/mfac/README.md b/research/mfac/README.md
@@ -0,0 +1,113 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Matrix-Free Approximate Curvature (M-FAC) Pruning
+
+The paper
+[Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization](https://arxiv.org/pdf/2107.03356.pdf)
+written by Elias Frantar, Eldar Kurtic, and Assistant Professor Dan Alistarh of IST Austria
+introduces the Matrix-Free Approximate Curvature (M-FAC) method of pruning.
+M-FAC builds on advances from the [WoodFisher](https://arxiv.org/pdf/2004.14340.pdf)
+pruning paper to efficiently use first-order information (gradients) to determine optimal weights
+to prune by approximating the corresponding second-order information.
+This algorithm is shown to outperform magnitude pruning as well as other second-order pruning
+techniques on a variety of one-shot and gradual pruning tasks.
+
+## Using M-FAC with SparseML
+
+SparseML makes it easy to use the M-FAC pruning algorithm as part of sparsification
+recipes to improve pruning recovery by providing an `MFACPruningModifier`.
+The `MFACPruningModifier` contains the same settings as the magnitude
+pruning modifiers and contains extra settings for the M-FAC algorithm under the
+`mfac_options` parameter.  `mfac_options` should be provided as a YAML dictionary and
+details of the main options are provided below.
+
+### Example M-FAC Recipe
+The following is an example `MFACPruningModifier` to be used in place of other
+pruning modifiers in a recipe:
+
+```yaml
+pruning_modifiers:
+  - !MFACPruningModifier
+    params: __ALL_PRUNABLE__
+    init_sparsity: 0.05
+    final_sparsity: 0.85
+    start_epoch: 1.0
+    end_epoch: 61.0
+    update_frequency: 4.0
+    mfac_options:
+      num_grads: {0.0: 256, 0.5: 512, 0.75: 1024, 0.83: 1400}
+      fisher_block_size: 10000
+      available_gpus: ["cuda:0"]
+```
+
+### mfac_options Parameters
+The following parameters can be specified under the `mfac_options` parameter to control
+how the M-FAC calculations are made. Ideal values will depend on the system
+available to run on and model to be pruned.
+
+#### num_grads
+To approximate the second order information in the M-FAC algorithm, first order
+gradients are used. `num_grads` specifies the number of recent gradient samples to store
+of a model while training.
+
+This value can be an int where that constant value will be used throughout pruning.
+Alternatively the value can be a dictionary of float sparsity values to the number of
+gradients that should be stored when that sparsity level (between 0.0 and 1.0) is reached.
+If a dictionary is used, then 0.0 must be included as a key for the base number of gradients
+to store (i.e. {0: 64, 0.5: 128, 0.75: 256}).
+
+Storing gradients can be expensive, as for a dense model, each additional gradient
+sample stored requires about the same memory that the entire model needs. This is why
+the dictionary option allows for more gradients to be stored as the model gets more
+sparse.
+
+If a M-FAC pruning run is unexpectedly killed, the reason could likely be that
+the gradient storage requirements exceeded the system's RAM. A safe rule of thumb for
+initial number of gradients is the number should be no greater than 1/4 of the
+available CPU RAM divided by the model size.
+
+
+#### fisher_block_size
+To limit the computational cost of calculating second order information, the M-FAC
+algorithm may compute a block diagonal matrix of a certain block size that is
+sufficient for generating the necessary information for pruning.
+
+The `fisher_block_size` specifies this block size.  If using GPUs to perform the
+M-FAC computations, the GPUs should have `num_grads * fisher_block_size` extra
+memory during training so each block can be stored and computed sequentially on a GPU.
+
+The default block size is 2000, and generally block sizes between 1000 and 10000 may be
+ideal. If `None` is provided, the full matrix will be computed without blocks.
+
+
+#### available_gpus
+`available_gpus` is a list of GPU devices names to perform the WoodFisher computation
+with. If not provided, computation will be done on the CPU.
+
+
+## Tutorials
+
+Tutorials for using M-FAC with SparseML are provided in the [tutorials](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials)
+directory.  Currently there are tutorials available for
+[one-shot](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/one_shot_pruning_with_mfac.md)
+and [gradual](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/gradual_pruning_with_mfac.md)
+pruning with M-FAC.
+
+## Need Help?
+For Neural Magic Support, sign up or log in to get help with your questions in our
+**Tutorials channel:** [Discourse Forum](https://discuss.neuralmagic.com/)
+and/or [Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ).
diff --git a/research/mfac/recipes/pruning-mnistnet-one_shot-magnitude.md b/research/mfac/recipes/pruning-mnistnet-one_shot-magnitude.md
@@ -0,0 +1,31 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+---
+pruning_modifiers:
+  - !GMPruningModifier
+    params: __ALL_PRUNABLE__
+    init_sparsity: 0.0
+    final_sparsity: 0.35
+    start_epoch: 0.0
+    end_epoch: 1.0
+    update_frequency: 1.0
+---
+
+# Pruning MNISTNet with Magnitude
+This recipe prunes a model to 35% sparsity using magnitude pruning.
+This recipe is intended for use with one-shot pruning and is for example
+purposes only.
diff --git a/research/mfac/recipes/pruning-mnistnet-one_shot-mfac.md b/research/mfac/recipes/pruning-mnistnet-one_shot-mfac.md
@@ -0,0 +1,35 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+---
+pruning_modifiers:
+  - !MFACPruningModifier
+    params: __ALL_PRUNABLE__
+    init_sparsity: 0.0
+    final_sparsity: 0.35
+    start_epoch: 0.0
+    end_epoch: 1.0
+    update_frequency: 1.0
+    mfac_options:
+      num_grads: 512
+      fisher_block_size: 2000
+---
+
+# Pruning MNISTNet with M-FAC
+This recipe prunes a model to 35% sparsity using the M-FAC pruning algorithm.
+It is intended for use with MNISTNet but could be used to prune other models
+in one shot, however the `final_sparsity` and `mfac_options` should be adjusted
+accordingly.
diff --git a/research/mfac/recipes/pruning-mobilenet-imagenette-magnitude-short-95.md b/research/mfac/recipes/pruning-mobilenet-imagenette-magnitude-short-95.md
@@ -0,0 +1,60 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+---
+# Epoch Variables
+start_epoch: &start_epoch 0.0
+end_epoch: &end_epoch 3.0
+pruning_start_epoch: &pruning_start_epoch 1.0
+pruning_end_epoch: &pruning_end_epoch 2.0
+lr: &lr 0.0004
+
+# Pruning Variables
+pruning_update_frequency: &pruning_update_frequency 1.0
+pruning_mask_type: &pruning_mask_type unstructured
+
+target_sparsities: &target_sparsities
+  0.90: ['sections.1.0.point.conv.weight', 'sections.1.1.point.conv.weight', 'sections.2.0.point.conv.weight']
+  0.95: ['sections.2.1.point.conv.weight', 'sections.3.0.point.conv.weight', 'sections.3.1.point.conv.weight', 'sections.3.5.point.conv.weight']
+  0.97: ['sections.3.2.point.conv.weight', 'sections.3.3.point.conv.weight', 'sections.3.4.point.conv.weight', 'sections.4.0.point.conv.weight', 'sections.4.1.point.conv.weight']
+
+# Modifiers Groups:
+training_modifiers:
+  - !EpochRangeModifier
+    start_epoch: *start_epoch
+    end_epoch: *end_epoch
+
+  - !SetLearningRateModifier
+    start_epoch: *start_epoch
+    learning_rate: *lr
+
+pruning_modifiers:
+  - !GMPruningModifier
+    params: []
+    init_sparsity: 0.35
+    final_sparsity: *target_sparsities
+    start_epoch: *pruning_start_epoch
+    end_epoch: *pruning_end_epoch
+    update_frequency: *pruning_update_frequency
+    mask_type: *pruning_mask_type
+
+  - !SetWeightDecayModifier
+    weight_decay: 0.0
+    start_epoch: *pruning_end_epoch
+---
+
+# Pruning MobileNet-Imagenette Magnitude
+This recipe prunes a MobileNet model to 95% sparsity over 3 epochs using magnitude pruning.
diff --git a/research/mfac/recipes/pruning-mobilenet-imagenette-mfac-short-95.md b/research/mfac/recipes/pruning-mobilenet-imagenette-mfac-short-95.md
@@ -0,0 +1,71 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+---
+# Epoch Variables
+start_epoch: &start_epoch 0.0
+end_epoch: &end_epoch 3.0
+pruning_start_epoch: &pruning_start_epoch 1.0
+pruning_end_epoch: &pruning_end_epoch 2.0
+lr: &lr 0.0004
+
+# Pruning Variables
+pruning_update_frequency: &pruning_update_frequency 1.0
+pruning_mask_type: &pruning_mask_type unstructured
+
+target_sparsities: &target_sparsities
+  0.90: ['sections.1.0.point.conv.weight', 'sections.1.1.point.conv.weight', 'sections.2.0.point.conv.weight']
+  0.95: ['sections.2.1.point.conv.weight', 'sections.3.0.point.conv.weight', 'sections.3.1.point.conv.weight', 'sections.3.5.point.conv.weight']
+  0.97: ['sections.3.2.point.conv.weight', 'sections.3.3.point.conv.weight', 'sections.3.4.point.conv.weight', 'sections.4.0.point.conv.weight', 'sections.4.1.point.conv.weight']
+
+# Modifiers Groups:
+training_modifiers:
+  - !EpochRangeModifier
+    start_epoch: *start_epoch
+    end_epoch: *end_epoch
+
+  - !SetLearningRateModifier
+    start_epoch: *start_epoch
+    learning_rate: *lr
+
+pruning_modifiers:
+  - !MFACPruningModifier
+    params: []
+    init_sparsity: 0.35
+    final_sparsity: *target_sparsities
+    start_epoch: *pruning_start_epoch
+    end_epoch: *pruning_end_epoch
+    update_frequency: *pruning_update_frequency
+    mask_type: *pruning_mask_type
+    mfac_options:
+      num_grads: 256
+      fisher_block_size: 2000
+      available_gpus: ["cuda:0"]
+
+  - !SetWeightDecayModifier
+    weight_decay: 0.0
+    start_epoch: *pruning_end_epoch
+---
+
+# Pruning MobileNet-Imagenette M-FAC
+This recipe prunes a MobileNet model to 95% sparsity over 3 epochs using the M-FAC algorithm.
+
+* This recipe must be run at train batch size 32. If running at a different batch size,
+    the learning rate and number of M-FAC gradients should be adjusted accordingly
+* `available_gpus` should be updated based on the devices available on the system.
+    If no GPU is available, then it should be removed
+* This recipe is for demonstration purposes only; in practice a larger dataset and longer pruning
+    schedule should be used to obtain best results
diff --git a/research/mfac/tutorials/gradual_pruning_with_mfac.md b/research/mfac/tutorials/gradual_pruning_with_mfac.md
diff --git a/research/mfac/tutorials/one_shot_pruning_with_mfac.md b/research/mfac/tutorials/one_shot_pruning_with_mfac.md