|
| 1 | +<!-- |
| 2 | +Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. |
| 3 | +
|
| 4 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +you may not use this file except in compliance with the License. |
| 6 | +You may obtain a copy of the License at |
| 7 | +
|
| 8 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +Unless required by applicable law or agreed to in writing, |
| 11 | +software distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +See the License for the specific language governing permissions and |
| 14 | +limitations under the License. |
| 15 | +--> |
| 16 | + |
| 17 | +# Matrix-Free Approximate Curvature (M-FAC) Pruning |
| 18 | + |
| 19 | +The paper |
| 20 | +[Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization](https://arxiv.org/pdf/2107.03356.pdf) |
| 21 | +written by Elias Frantar, Eldar Kurtic, and Assistant Professor Dan Alistarh of IST Austria |
| 22 | +introduces the Matrix-Free Approximate Curvature (M-FAC) method of pruning. |
| 23 | +M-FAC builds on advances from the [WoodFisher](https://arxiv.org/pdf/2004.14340.pdf) |
| 24 | +pruning paper to efficiently use first-order information (gradients) to determine optimal weights |
| 25 | +to prune by approximating the corresponding second-order information. |
| 26 | +This algorithm is shown to outperform magnitude pruning as well as other second-order pruning |
| 27 | +techniques on a variety of one-shot and gradual pruning tasks. |
| 28 | + |
| 29 | +## Using M-FAC with SparseML |
| 30 | + |
| 31 | +SparseML makes it easy to use the M-FAC pruning algorithm as part of sparsification |
| 32 | +recipes to improve pruning recovery by providing an `MFACPruningModifier`. |
| 33 | +The `MFACPruningModifier` contains the same settings as the magnitude |
| 34 | +pruning modifiers and contains extra settings for the M-FAC algorithm under the |
| 35 | +`mfac_options` parameter. `mfac_options` should be provided as a YAML dictionary and |
| 36 | +details of the main options are provided below. |
| 37 | + |
| 38 | +### Example M-FAC Recipe |
| 39 | +The following is an example `MFACPruningModifier` to be used in place of other |
| 40 | +pruning modifiers in a recipe: |
| 41 | + |
| 42 | +```yaml |
| 43 | +pruning_modifiers: |
| 44 | + - !MFACPruningModifier |
| 45 | + params: __ALL_PRUNABLE__ |
| 46 | + init_sparsity: 0.05 |
| 47 | + final_sparsity: 0.85 |
| 48 | + start_epoch: 1.0 |
| 49 | + end_epoch: 61.0 |
| 50 | + update_frequency: 4.0 |
| 51 | + mfac_options: |
| 52 | + num_grads: {0.0: 256, 0.5: 512, 0.75: 1024, 0.83: 1400} |
| 53 | + fisher_block_size: 10000 |
| 54 | + available_gpus: ["cuda:0"] |
| 55 | +``` |
| 56 | +
|
| 57 | +### mfac_options Parameters |
| 58 | +The following parameters can be specified under the `mfac_options` parameter to control |
| 59 | +how the M-FAC calculations are made. Ideal values will depend on the system |
| 60 | +available to run on and model to be pruned. |
| 61 | + |
| 62 | +#### num_grads |
| 63 | +To approximate the second order information in the M-FAC algorithm, first order |
| 64 | +gradients are used. `num_grads` specifies the number of recent gradient samples to store |
| 65 | +of a model while training. |
| 66 | + |
| 67 | +This value can be an int where that constant value will be used throughout pruning. |
| 68 | +Alternatively the value can be a dictionary of float sparsity values to the number of |
| 69 | +gradients that should be stored when that sparsity level (between 0.0 and 1.0) is reached. |
| 70 | +If a dictionary is used, then 0.0 must be included as a key for the base number of gradients |
| 71 | +to store (i.e. {0: 64, 0.5: 128, 0.75: 256}). |
| 72 | + |
| 73 | +Storing gradients can be expensive, as for a dense model, each additional gradient |
| 74 | +sample stored requires about the same memory that the entire model needs. This is why |
| 75 | +the dictionary option allows for more gradients to be stored as the model gets more |
| 76 | +sparse. |
| 77 | + |
| 78 | +If a M-FAC pruning run is unexpectedly killed, the reason could likely be that |
| 79 | +the gradient storage requirements exceeded the system's RAM. A safe rule of thumb for |
| 80 | +initial number of gradients is the number should be no greater than 1/4 of the |
| 81 | +available CPU RAM divided by the model size. |
| 82 | + |
| 83 | + |
| 84 | +#### fisher_block_size |
| 85 | +To limit the computational cost of calculating second order information, the M-FAC |
| 86 | +algorithm may compute a block diagonal matrix of a certain block size that is |
| 87 | +sufficient for generating the necessary information for pruning. |
| 88 | + |
| 89 | +The `fisher_block_size` specifies this block size. If using GPUs to perform the |
| 90 | +M-FAC computations, the GPUs should have `num_grads * fisher_block_size` extra |
| 91 | +memory during training so each block can be stored and computed sequentially on a GPU. |
| 92 | + |
| 93 | +The default block size is 2000, and generally block sizes between 1000 and 10000 may be |
| 94 | +ideal. If `None` is provided, the full matrix will be computed without blocks. |
| 95 | + |
| 96 | + |
| 97 | +#### available_gpus |
| 98 | +`available_gpus` is a list of GPU devices names to perform the WoodFisher computation |
| 99 | +with. If not provided, computation will be done on the CPU. |
| 100 | + |
| 101 | + |
| 102 | +## Tutorials |
| 103 | + |
| 104 | +Tutorials for using M-FAC with SparseML are provided in the [tutorials](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials) |
| 105 | +directory. Currently there are tutorials available for |
| 106 | +[one-shot](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/one_shot_pruning_with_mfac.md) |
| 107 | +and [gradual](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tutorials/gradual_pruning_with_mfac.md) |
| 108 | +pruning with M-FAC. |
| 109 | + |
| 110 | +## Need Help? |
| 111 | +For Neural Magic Support, sign up or log in to get help with your questions in our |
| 112 | +**Tutorials channel:** [Discourse Forum](https://discuss.neuralmagic.com/) |
| 113 | +and/or [Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). |
0 commit comments