Skip to content

Commit 8f8c555

Browse files
authored
Further update documentation for 0.7.0 (#744)
1 parent 8eb29fc commit 8f8c555

File tree

10 files changed

+232
-76
lines changed

10 files changed

+232
-76
lines changed

CITATION.cff

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ type: software
44
authors:
55
- given-names: "FastML Team"
66
title: "hls4ml"
7+
version: "v0.7.0rc1"
78
doi: 10.5281/zenodo.1201549
89
repository-code: "https://github.com/fastmachinelearning/hls4ml"
910
url: "https://fastmachinelearning.org/hls4ml"
@@ -21,3 +22,35 @@ abstract: |
2122
hls4ml is an open-source software-hardware codesign workflow
2223
to interpret and translate machine learning algorithms for
2324
implementations in hardware, including FPGAs and ASICs.
25+
references:
26+
- type: article
27+
title: "Fast inference of deep neural networks on FPGAs with hls4ml"
28+
authors:
29+
- family-names: "Duarte"
30+
given-names: "Javier"
31+
- family-names: "Han"
32+
given-names: "Song"
33+
- family-names: "Harris"
34+
given-names: "Philip"
35+
- family-names: "Jindariani"
36+
given-names: "Sergo"
37+
- family-names: "Kreinar"
38+
given-names: "Edward"
39+
- family-names: "Kreis"
40+
given-names: "Benjamin"
41+
- family-names: "Ngadiuba"
42+
given-names: "Jennifer"
43+
- family-names: "Pierini"
44+
given-names: "Maurizio"
45+
- family-names: "Rivera"
46+
given-names: "Ryan"
47+
- family-names: "Tran"
48+
given-names: "Nhan"
49+
- family-names: "Wu"
50+
given-names: "Zhenbin"
51+
journal: "JINST"
52+
volume: "13"
53+
start: "P07027"
54+
doi: "10.1088/1748-0221/13/07/P07027"
55+
year: "2018"
56+
number: "07"

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,12 @@ hls4ml.report.read_vivado_report('my-hls-test')
6464
# Citation
6565
If you use this software in a publication, please cite the software
6666
```bibtex
67-
@software{vloncar_2021_5680908,
67+
@software{fastml_hls4ml,
6868
author = {{FastML Team}},
6969
title = {fastmachinelearning/hls4ml},
70-
year = 2021,
70+
year = 2023,
7171
publisher = {Zenodo},
72+
version = {v0.7.0rc1},
7273
doi = {10.5281/zenodo.1201549},
7374
url = {https://github.com/fastmachinelearning/hls4ml}
7475
}

docs/advanced/accelerator.rst

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
=========================
2+
VivadoAccelerator Backend
3+
=========================
4+
5+
The ``VivadoAccelerator`` backend of ``hls4ml`` leverages the `PYNQ <http://pynq.io/>`_ software stack to easily deploy models on supported devices.
6+
Currently ``hls4ml`` supports the following boards:
7+
8+
* `pynq-z2 <https://www.xilinx.com/support/university/xup-boards/XUPPYNQ-Z2.html>`_ (part: ``xc7z020clg400-1``)
9+
* `zcu102 <https://www.xilinx.com/products/boards-and-kits/ek-u1-zcu102-g.html>`_ (part: ``xczu9eg-ffvb1156-2-e``)
10+
* `alveo-u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_ (part: ``xcu50-fsvh2104-2-e``)
11+
* `alveo-u250 <https://www.xilinx.com/products/boards-and-kits/alveo/u250.html>`_ (part: ``xcu250-figd2104-2L-e``)
12+
* `alveo-u200 <https://www.xilinx.com/products/boards-and-kits/alveo/u200.html>`_ (part: ``xcu200-fsgd2104-2-e``)
13+
* `alveo-u280 <https://www.xilinx.com/products/boards-and-kits/alveo/u280.html>`_ (part: ``xcu280-fsvh2892-2L-e``)
14+
15+
but, in principle, support can be extended to `any board supported by PYNQ <http://www.pynq.io/board.html>`_.
16+
For the Zynq-based boards, there are two components: an ARM-based processing system (PS) and FPGA-based programmable logic (PL), with various intefaces between the two.
17+
18+
.. image:: ../img/zynq_interfaces.png
19+
:height: 300px
20+
:align: center
21+
:alt: Zynq PL/PS interfaces
22+
23+
Neural Network Overlay
24+
======================
25+
26+
In the PYNQ project, programmable logic circuits are presented as hardware libraries called *overlays*.
27+
The overlay can be accessed through a Python API.
28+
In ``hls4ml``, we create a custom **neural network overlay**, which sends and receives data via AXI stream.
29+
The target device is programmed using a bitfile that is generated by the ``VivadoAccelerator`` backend.
30+
31+
.. image:: ../img/pynqframe.png
32+
:width: 600px
33+
:align: center
34+
:alt: PYNQ software stack
35+
36+
Example
37+
=======
38+
39+
This example is taken from `part 7 of the hls4ml tutorial <https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/part7_deployment.ipynb>`_.
40+
Specifically, we'll deploy a model on a ``pynq-z2`` board.
41+
42+
First, we generate the bitfile from a Keras model ``model`` and a config.
43+
44+
.. code-block:: Python
45+
46+
import hls4ml
47+
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
48+
hls_model = hls4ml.converters.convert_from_keras_model(model,
49+
hls_config=config,
50+
output_dir='hls4ml_prj_pynq',
51+
backend='VivadoAccelerator',
52+
board='pynq-z2')
53+
hls4ml.build(bitfile=True)
54+
55+
56+
After this command completes, we will need to package up the bitfile, hardware handoff, and Python driver to copy to the PS of the board.
57+
58+
.. code-block:: bash
59+
60+
mkdir -p package
61+
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit package/hls4ml_nn.bit
62+
cp hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh package/hls4ml_nn.hwh
63+
cp hls4ml_prj_pynq/axi_stream_driver.py package/
64+
tar -czvf package.tar.gz -C package/ .
65+
66+
Then we can copy this package to the PS of the board and untar it.
67+
68+
Finally, on the PS in Python we can create a ``NeuralNetworkOverlay`` object, which will download the bitfile onto the PL of the board.
69+
We also must provide the shapes of our input and output data, ``X_test.shape`` and ``y_test.shape``, respectively, to allocate the buffers for the data transfer.
70+
The ``predict`` method will send the input data to the PL and return the output data ``y_hw``.
71+
72+
.. code-block:: Python
73+
74+
from axi_stream_driver import NeuralNetworkOverlay
75+
76+
nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
77+
y_hw, latency, throughput = nn.predict(X_test, profile=True)

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,3 +124,4 @@ def get_pypi_version(package, url_pattern=URL_PATTERN):
124124
'github_version': 'main', # Version
125125
'conf_py_path': '/docs/', # Path in the checkout to the docs root
126126
}
127+
html_favicon = 'img/hls4ml_logo.svg'

docs/flows.rst

Lines changed: 45 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,60 @@
22
Optimizer Passes and Flows
33
==========================
44

5-
The ``hls4ml`` package internally represents the model graph with the :py:class:`~hls4ml.model.graph.ModelGraph` class.
6-
The nodes in this graph are represented by classes derived from the :py:class:`~hls4ml.model.layers.Layer` base class.
5+
The ``hls4ml`` library will parse models from Keras, PyTorch or ONNX into an internal execution graph. This model graph is represented with the
6+
:py:class:`~hls4ml.model.graph.ModelGraph` class. The nodes in this graph, corresponding to the layer and operations of the input model are represented
7+
by classes derived from the :py:class:`~hls4ml.model.layers.Layer` base class.
78

8-
Layers have only inputs, outputs and attributes.
9-
All information about the layer's state and configuration is stored in the attributes.
10-
All weights, variables and data types are attributes and there are mapping views to sort through them.
9+
Layers are required to have defined inputs and outputs that define how they are connected in the graph and what is the shape of their output. All information
10+
about the layer's state and configuration is stored in its attributes. All weights, variables and data types are attributes and there are mapping views to sort through them.
1111
Layers can define expected attributes and can be verified for correctness, or to produce a list of configurable attributes that user can tweak.
1212

1313
Optimizer passes
1414
----------------
1515

16-
An :py:class:`~hls4ml.model.optimizer.optimizer.OptimizerPass` transforms a model graph.
17-
All model/layer transformations should happen in these optimizer passes.
18-
There are a number of types of optimizer passes:
16+
To reach a state from which the code can be generated, internal model graph will undergo a series of optimizations (transformations), dubbed *optimization passes*.
17+
All transformations of the model and any modification to any layer's attributes must be implemented through an optimization pass. All optimizer passes derive from
18+
the :py:class:`~hls4ml.model.optimizer.optimizer.OptimizerPass` class. Optimizer passes are applied at the level of nodes/layers, however a special class
19+
:py:class:`~hls4ml.model.optimizer.optimizer.ModelOptimizerPass` exists that is applied on the full model. Subclasses of
20+
:py:class:`~hls4ml.model.optimizer.optimizer.OptimizerPass` must provide a criteria in ``match`` function that, if satisfied, will perform the transformation from
21+
``transform`` function. The boolean return value of ``transform`` indicates if the optimizer pass made changes to the model graph, requiring running the optimizers again.
22+
Example of an optimizer pass that runs on the full model, is :py:class:`~hls4ml.model.optimizer.passes.stamp.MakeStamp`, while an example of the layer optimizer is
23+
:py:class:`~hls4ml.model.optimizer.passes.fuse_biasadd` class that adds a bias to a :py:class:`~hls4ml.model.layers.Dense`,
24+
:py:class:`~hls4ml.model.layers.Conv1D`, or :py:class:`~hls4ml.model.layers.Conv2D` layer.
1925

20-
* layer-specific: These are special optimizations for a given layer.
21-
An example is the :py:class:`~hls4ml.model.optimizer.passes.fuse_biasadd` class that adds a bias to a :py:class:`~hls4ml.model.layers.Dense`, :py:class:`~hls4ml.model.layers.Conv1D`, or :py:class:`~hls4ml.model.layers.Conv2D` layer.
22-
* backend-specific: These are only used for particular backends. An example is :py:class:`~hls4ml.backends.vivado.passes.repack_stream.ReshapeStream`.
23-
* model-level: These model-level optimizer passes are run on every type of layer.
24-
* templates: These add the HLS code for a particular backend, e.g., :py:class:`~hls4ml.backends.vivado.passes.core_templates.DenseFunctionTemplate`.
25-
* decorators
26+
Optimizers can be general, independent of the backend, in which case they are located in :py:mod:`hls4ml.model.optimizer.passes`, or they may be backend-specific,
27+
in which case they are located in a folder dependent on the backend, e.g., :py:mod:`hls4ml.backends.vivado.passes` or
28+
:py:mod:`hls4ml.backends.quartus.passes`. A common set of optimizers that are used by FPGA backends are located in :py:mod:`hls4ml.backends.fpga.passes`.
29+
30+
Certain optimizers are used frequently enough that it makes sense to define special classes, which inherit from :py:class:`~hls4ml.model.optimizer.optimizer.OptimizerPass`
31+
32+
* :py:class:`~hls4ml.model.optimizer.optimizer.GlobalOptimizerPass`: An optimizer pass that matches each node. This is useful, for example,
33+
to transform the types for a particular backend.
34+
* :py:class:`~hls4ml.model.optimizer.optimizer.LayerOptimizerPass`: An optimizer pass that matches each node of a particular layer type. This is
35+
useful, for example, to write out the HLS code for a particular node that remains in the final graph.
36+
* :py:class:`~hls4ml.model.optimizer.optimizer.ConfigurableOptimizerPass`: An optimizer pass that has some configurable parameters.
37+
* :py:class:`~hls4ml.backends.template.Template`: An optimizer pass that populates a code template and assigns it to an attribute of a given layer. This is commonly used
38+
to generate code blocks in later stages of the conversion.
39+
40+
Note that :py:class:`~hls4ml.model.optimizer.optimizer.LayerOptimizerPass` and :py:class:`~hls4ml.model.optimizer.optimizer.ModelOptimizerPass`
41+
also exist as decorators that wrap a function.
42+
43+
New optimizers can be registered with the :py:func:`~hls4ml.model.optimizer.optimizer.register_pass`. Optimizers should be assigned to a flow (see below).
2644

2745
Flows
2846
-----
29-
A :py:class:`~hls4ml.model.flow.flow.Flow` is an ordered set of optimizers that may depend on other flows.
47+
A :py:class:`~hls4ml.model.flow.flow.Flow` is an ordered set of optimizers that represent a single stage in the conversion process. The optimizers from a flow are applied
48+
until they no longer make changes to the model graph after which the next flow (stage) can start. Flows may depend on other flows being applied before them,
49+
ensuring the model graph is in a desired state before a flow starts. The function :py:func:`~hls4ml.model.flow.flow.register_flow` is used to register a new flow. Flows
50+
are applied on a model graph with :py:func:`~hls4ml.model.graph.ModelGraph.apply_flow`.
51+
3052
There are common model-level flows that can run regardless of the backend, and there are backend-specific flows.
31-
Each backend provides provides a default flow for processing.
32-
For example, the Vivado backend defaults to an `IP flow <https://github.com/fastmachinelearning/hls4ml/blob/7c0a065935904f50bd7e4c547f85354b36276092/hls4ml/backends/vivado/vivado_backend.py#L148-L160>`_ that requires additional flows and produces an IP.
53+
The `convert and optimize <https://github.com/fastmachinelearning/hls4ml/blob/7c0a065935904f50bd7e4c547f85354b36276092/hls4ml/model/optimizer/__init__.py#L14-L20>`_
54+
flows do not depend on a backend.
55+
56+
Each backend provides provides a default flow that defines the default target for that backend. For example, the Vivado backend defaults to an
57+
`IP flow <https://github.com/fastmachinelearning/hls4ml/blob/7c0a065935904f50bd7e4c547f85354b36276092/hls4ml/backends/vivado/vivado_backend.py#L148-L160>`_
58+
that requires additional flows and produces an IP. It runs no optimizers itself, but it requires that many other flows (sub-flows) to have run.
59+
The convert and optimize flows defined above are some of these required sub-flows.
60+
3361
Another example is FIFO buffer depth optimization explained in the :ref:`FIFO Buffer Depth Optimization` section.

docs/img/pynqframe.png

328 KB
Loading

docs/img/zynq_interfaces.png

74.2 KB
Loading

docs/index.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
:hidden:
33
:caption: Introduction
44

5-
release_notes
5+
concepts
66
status
77
setup
8+
release_notes
89
command
9-
concepts
1010
details
1111
flows
1212
reference
@@ -24,6 +24,7 @@
2424

2525
advanced/fifo_depth
2626
advanced/extension
27+
advanced/accelerator
2728

2829
.. toctree::
2930
:hidden:

docs/reference.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,12 @@ If you use this software in a publication, please cite the software
99

1010
.. code-block:: bibtex
1111
12-
@software{vloncar_2021_5680908,
12+
@software{fastml_hls4ml,
1313
author = {{FastML Team}},
1414
title = {fastmachinelearning/hls4ml},
15-
year = 2021,
15+
year = 2023,
1616
publisher = {Zenodo},
17+
version = {v0.7.0rc1},
1718
doi = {10.5281/zenodo.1201549},
1819
url = {https://github.com/fastmachinelearning/hls4ml}
1920
}

0 commit comments

Comments
 (0)