FIFO depth optimizer for Vitis backend (#1037)

steltze · stzelepi · JanFSchulte · web-flow · commit dced28f83739 · 2025-03-04T00:44:32.000+01:00
* Init depthwise resource implementation for streaming interface

* Init fifo optimization file for vitis backend

* Register fifo opt flow in vitis backend

* Init changes in build_prj.tcl and modification files in vitis writer

* Fix vitis writer by adding project.tcl modifer

* Fix build_prj.tcl to synthesize with the large FIFOs

* Fix if statement in cosim tcl script

* Clean the optimizer file

* Implement the optmized depths parsing

* Implement setter for new depths

* Fix csv file name parsing

* Fix name parsing, deeply hardcoded for now

* Clean documentation and files

* Remove unused function

* Add documentation and runtime checks

* Add documentation

* Include extracting optimized depths

* Fix documentation

* Add function to override Vivado test bench

* Fix hls4ml docs

* Undo changes in sepconv stream

* Format code

* Run pre-commit

* Remove unused imports

* Run pre-commit

* Remove comment

* Fix typo and documentation

* Remove commented out code

* Init unit test

* Use proper model for unit test to profile fifos

* Fix json generator to include before and after depths

* Set up full test

* Set up exception tests

* Clean test

* Fix full test

* Clean test

* Run precommit

* Force the cosimulation to execute twice

* Skip tests

* Update documentation

* Fix conflict, use built-in os function

* Setup onnx pytest

* Rebase and fix optimizer after main branch changes

* Update documentation

* Run precommit

* Fix qonnx test by optimizing away the input quantization

* Run precommit

* Address review comments

* Fix c-test for loop

* Correct comment

* Streamlining some changes to better fit the codebase (but mostly cosmetic)

---------

Co-authored-by: stzelepi &lt;stylianos.tzelepis@cern.ch&gt;
Co-authored-by: Jan-Frederik Schulte &lt;jschulte@cern.ch&gt;
Co-authored-by: Vladimir Loncar &lt;vloncar@users.noreply.github.com&gt;
diff --git a/docs/advanced/fifo_depth.rst b/docs/advanced/fifo_depth.rst
@@ -5,28 +5,29 @@ FIFO Buffer Depth Optimization
 With the ``io_stream`` IO type, each layer is connected with the subsequent layer through first-in first-out (FIFO) buffers.
 The implementation of the FIFO buffers contribute to the overall resource utilization of the design, impacting in particular the BRAM or LUT utilization.
 Because the neural networks can have complex architectures generally, it is hard to know a priori the correct depth of each FIFO buffer.
-By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary overutilization of resources.
+By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary over-utilization of resources.
 
-In order to reduce the impact on the resources used for FIFO buffer implementation, an optimization has been developed in `#509 <https://github.com/fastmachinelearning/hls4ml/pull/509>`_ that correctly sizes the depth of the FIFO buffers by analyzing the RTL cosimulation.
-We implemented this FIFO buffer resizing as a :py:class:`~hls4ml.backends.vivado.passes.fifo_depth_optimization` optimizer pass.
+In order to reduce the impact on the resources used for FIFO buffer implementation, an optimization flow has been developed that correctly sizes the depth
+of the FIFO buffers by analyzing the RTL co-simulation. This feature is currently available in ``Vitis`` and ``Vivado`` backends.
+
+In ``Vivado`` backend, FIFO buffer resizing is implemented as a :py:class:`~hls4ml.backends.vivado.passes.fifo_depth_optimization` optimizer pass.
 Through RTL simulation with large FIFO buffers (by default set to a depth of 100,000), we estimate the maximum occupation of each FIFO.
 Once the maximum depth is determined, the optimizer pass sets the FIFO buffer depth to that value plus 1.
 
-As an example, we show below how to use the optimizer pass, inspired by this `GitHub Gist <https://gist.github.com/nicologhielmetti/3a268be32755448920e9f7d5c78a76d8>`_.
-First, we can define a simple neural network in Keras
+Below we show an example of the use of the FIFO depth optimization. First, we can define a simple neural network in Keras:
 
 .. code-block:: Python
 
     from tensorflow.keras.layers import Dense
     from tensorflow.keras.models import Sequential
 
     model = Sequential()
-    model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu')
+    model.add(Dense(64, input_shape=(16,), name='fc1', activation='relu'))
     model.add(Dense(32, name='fc2', activation='relu'))
     model.add(Dense(32, name='fc3', activation='relu'))
-    model.add(Dense(5, name='fc3', activation='softmax'))
+    model.add(Dense(5, name='fc4', activation='softmax'))
 
-Then, we can convert the model, including the flow
+Then, we can convert the model, including the flow:
 
 .. code-block:: Python
 
@@ -47,3 +48,17 @@ Then, we can convert the model, including the flow
     hls_model.build(reset=False, csim=True, synth=True, cosim=True)
 
 For more details and results, see `H. Borras et al., "Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark" (2022) <https://arxiv.org/abs/2206.11791>`_.
+
+Similarly, the FIFO buffers can be optimized while using the ``Vitis`` backend with the following changes:
+
+.. code-block:: Python
+
+    config['Flows'] = ['vitis:fifo_depth_optimization']
+    hls4ml.model.optimizer.get_optimizer('vitis:fifo_depth_optimization').configure(profiling_fifo_depth=100_000)
+
+    hls_model = hls4ml.converters.convert_from_keras_model(model,
+                                                        io_type='io_stream',
+                                                        hls_config=config,
+                                                        output_dir='hls4mlprj_fifo_depth_opt',
+                                                        part='xc7z020clg400-1',
+                                                        backend='Vitis')
diff --git a/hls4ml/backends/vitis/passes/fifo_depth_optimization.py b/hls4ml/backends/vitis/passes/fifo_depth_optimization.py
@@ -0,0 +1,195 @@
+import json
+import zipfile
+
+from hls4ml.model.optimizer.optimizer import ConfigurableOptimizerPass, ModelOptimizerPass
+
+
+def initialize_large_fifos(model, profiling_fifo_depth):
+    """Set all FIFO depths equal to a large value so that they can be profiled.
+
+    Args:
+        model (ModelGraph): The model to which FIFO depth optimization is applied.
+        profiling_fifo_depth (int): A large non-negative integer, must be larger than the max expected depth of the FIFOs.
+
+    Returns:
+        Dict[str, int]: A dictionary containing FIFO names as keys and their initial depths as values is returned for
+        comparison with the optimized depths.
+    """
+
+    # filter all the output variables and keep only the internal FIFOs, excluding output objects that are not FIFOs and the
+    # input and output FIFOs as they can't be profiled and are implementation dependant i.e AXI Stream, AXI Master or
+    # connected to another IP
+    vars_to_profile = {
+        output_variable_name: output_variable
+        for output_variable_name, output_variable in model.output_vars.items()
+        if ('StreamVariable' in str(type(output_variable)))
+        and output_variable != model.get_output_variables()[0]
+        and output_variable != model.get_input_variables()[0]
+    }
+
+    # initialize all the fifos to `profiling_fifo_depth` so that they will be automatically implemented in BRAMs and so
+    # they will be profiled. Alternatively, "config_dataflow -override_user_fifo_depth profiling_fifo_depth" can be
+    # used inside build_prj.tcl to override all FIFO depths with the specified value
+    initial_fifo_depths = {}
+    for output_variable in vars_to_profile.values():
+        if output_variable.pragma:
+            initial_fifo_depths[output_variable.name] = int(output_variable.pragma[1])
+            output_variable.pragma = (output_variable.pragma[0], profiling_fifo_depth)
+    return initial_fifo_depths
+
+
+def execute_cosim_to_profile_fifos(model):
+    """Execute a co-simulation with a test-bench that calls the top function to properly profile the max FIFO depths.
+    Note that the top function needs to execute **least twice**, so user-provided input must have at least two samples.
+
+    Args:
+        model (ModelGraph): The model to which FIFO depth optimization is applied.
+    """
+    model.write()
+
+    model.build(
+        reset=False,
+        csim=False,
+        synth=True,
+        cosim=True,
+        validation=False,
+        export=False,
+        vsynth=False,
+        fifo_opt=True,
+    )
+
+
+def get_vitis_optimized_fifo_depths(model):
+    """Parse the files generated by the co-simulation to retrieve the optimized depths for the FIFOs.
+    Attention, only the FIFOs between the layers are profiled!
+
+    Args:
+        model (ModelGraph): The model to which FIFO depth optimization is applied.
+
+    Returns:
+        Dict[str, int]: A dictionary that contains the FIFO names as keys and the optimized depths as values.
+    """
+    # channel.zip is generated after the co-simulation and contains the chan_status*.csv files
+    # in the chan_status*.csv files the max depth achieved during co-simulation can be found at the last (4th) line
+    path_to_zip_file = (
+        model.config.get_output_dir()
+        + '/'
+        + model.config.get_project_name()
+        + '_prj'
+        + '/solution1/.autopilot/db/channel_depth_info/'
+    )
+
+    with zipfile.ZipFile(f'{path_to_zip_file}channel.zip', 'r') as zip_ref:
+        zip_ref.extractall(path_to_zip_file)
+
+    # the channel_info.csv file contains the mapping of each fifo name (i.e layer4_out_U) to the respective
+    # chan_status*.csv file
+    names_file_path = (
+        model.config.get_output_dir()
+        + '/'
+        + model.config.get_project_name()
+        + '_prj'
+        + '/solution1/.autopilot/db/channel_info.csv'
+    )
+
+    csv_fifo_depth_files = {}
+    with open(names_file_path) as names_file:
+        for line in names_file:
+            layer_name = line.split(',')[1]
+            csv_file_name = line.split(',')[3][:-1]
+            csv_fifo_depth_files[layer_name] = csv_file_name
+
+    optmized_fifo_depths = {}
+    for layer_name, file_name in csv_fifo_depth_files.items():
+        with open(path_to_zip_file + file_name) as chan_status_file:
+            lines = chan_status_file.readlines()
+            optmized_fifo_depths[layer_name[:-2]] = int(
+                lines[-1]
+            )  # remove "_U" from the layer name string and keep the last line of the file that contains the max depth
+
+    return optmized_fifo_depths
+
+
+def generate_depths_file(model, initial_fifo_depths, optimized_fifo_depths):
+    """Generate a json file with the names of the FIFOs, the initial depths set by hls4ml and their optimized depths,
+    for post-processing. The json file is not used by the rest of the pipeline, it is only produced for the user.
+
+    Args:
+        model (ModelGraph): The model to which FIFO depth optimization is applied.
+        initial_fifo_depths (Dict[str, int]): A dictionary that contains the FIFO names as keys and the initial
+        depths as values.
+        optimized_fifo_depths (Dict[str, int]): A dictionary that contains the FIFO names as keys and the optimized
+        depths as values.
+    """
+    depths = {}
+    for fifo_name in initial_fifo_depths.keys():
+        depths[fifo_name] = {}
+        depths[fifo_name]['initial'] = initial_fifo_depths[fifo_name]
+        depths[fifo_name]['optimized'] = optimized_fifo_depths[fifo_name]
+
+    with open(model.config.get_output_dir() + '/fifo_depths.json', 'w') as f:
+        json.dump(depths, f, indent=4)
+
+
+def set_optimized_fifo_depths(model, optimized_fifo_depths):
+    """Set the new optimized FIFO depths.
+
+    Args:
+        model (ModelGraph): The model to which FIFO depth optimization is applied.
+        optimized_fifo_depths (Dict[str, int]): A dictionary that contains the FIFO names as keys and the optimized
+        depths as values.
+    """
+
+    # iterate through the layer output FIFOs
+    for output_variable in model.output_vars.values():
+        if 'StreamVariable' in str(type(output_variable)):
+            if output_variable.pragma:
+
+                if output_variable.name not in optimized_fifo_depths.keys():
+                    continue
+
+                filtered_depth = optimized_fifo_depths[output_variable.name]
+                output_variable.pragma = (output_variable.pragma[0], filtered_depth)
+
+
+class FifoDepthOptimization(ConfigurableOptimizerPass, ModelOptimizerPass):
+    def __init__(self):
+        # use `profiling_fifo_depth = 0` to keep the default fifo depth
+        # consider changing 100_000 either with a very very large value > of any total bram storage space
+        # or via vitis 2023.2 c-simulation
+        self.profiling_fifo_depth = 100_000
+
+    def transform(self, model):
+        """Perform FIFO depth optimization between the FIFOs of all layers to reduce resource utilization as the
+        initial FIFOs set by hls4ml might be larger than required. At the end of the optimization the FIFOs will
+        have the largest depths achieved during co-simulation without causing any deadlocks between the layers
+        (producer-consumer), thus no additional delays between the layers. In some cases, this optimization
+        might lead to bigger FIFOs than initially set by the hls4ml tool in order to prevent deadlocks.
+
+        Args:
+            model (ModelGraph): The model to which FIFO depth optimization is applied.
+
+        Raises:
+            ValueError: If the FIFO depth for profiling provided by the user is not a non-negative integer.
+            RuntimeError: If the IO type is not set to "io_stream".
+
+        Returns:
+            bool: The execution state of the Optimizer Pass
+        """
+
+        if not isinstance(self.profiling_fifo_depth, int) or self.profiling_fifo_depth <= 0:
+            raise ValueError('The FIFO depth for profiling (profiling_fifo_depth variable) must be a non-negative integer.')
+
+        # check axi-stream or io-stream
+        if not (model.config.get_config_value('IOType') == 'io_stream'):
+            raise RuntimeError('To use this optimization you have to set `IOType` field to `io_stream` in the HLS config.')
+
+        initial_fifo_depths = initialize_large_fifos(model, self.profiling_fifo_depth)
+        execute_cosim_to_profile_fifos(model)
+        optimized_fifo_depths = get_vitis_optimized_fifo_depths(model)
+        generate_depths_file(model, initial_fifo_depths, optimized_fifo_depths)
+        set_optimized_fifo_depths(model, optimized_fifo_depths)
+
+        print('FIFO optimization completed')
+
+        return False
diff --git a/hls4ml/backends/vitis/vitis_backend.py b/hls4ml/backends/vitis/vitis_backend.py
@@ -34,6 +34,13 @@ def _register_flows(self):
 
         self._default_flow = register_flow('ip', None, requires=ip_flow_requirements, backend=self.name)
 
+        # Register the fifo depth optimization flow which is different from the one for vivado
+        fifo_depth_opt_passes = [
+            'vitis:fifo_depth_optimization'
+        ] + writer_passes  # After optimization, a new project will be written
+
+        register_flow('fifo_depth_optimization', fifo_depth_opt_passes, requires=['vitis:ip'], backend=self.name)
+
     def create_initial_config(
         self,
         part='xcvu13p-flga2577-2-e',
@@ -76,7 +83,18 @@ def create_initial_config(
 
         return config
 
-    def build(self, model, reset=False, csim=True, synth=True, cosim=False, validation=False, export=False, vsynth=False):
+    def build(
+        self,
+        model,
+        reset=False,
+        csim=True,
+        synth=True,
+        cosim=False,
+        validation=False,
+        export=False,
+        vsynth=False,
+        fifo_opt=False,
+    ):
         if 'linux' in sys.platform:
             found = os.system('command -v vitis_hls > /dev/null')
             if found != 0:
@@ -87,8 +105,17 @@ def build(self, model, reset=False, csim=True, synth=True, cosim=False, validati
         os.system(
             (
                 'vitis_hls -f build_prj.tcl "reset={reset} csim={csim} synth={synth} cosim={cosim} '
-                'validation={validation} export={export} vsynth={vsynth}"'
-            ).format(reset=reset, csim=csim, synth=synth, cosim=cosim, validation=validation, export=export, vsynth=vsynth)
+                'validation={validation} export={export} vsynth={vsynth} fifo_opt={fifo_opt}"'
+            ).format(
+                reset=reset,
+                csim=csim,
+                synth=synth,
+                cosim=cosim,
+                validation=validation,
+                export=export,
+                vsynth=vsynth,
+                fifo_opt=fifo_opt,
+            )
         )
         os.chdir(curr_dir)
 
diff --git a/hls4ml/templates/vivado/build_prj.tcl b/hls4ml/templates/vivado/build_prj.tcl
@@ -179,6 +179,7 @@ if {$opt(csim)} {
 
 if {$opt(synth)} {
     puts "***** C/RTL SYNTHESIS *****"
+
     set time_start [clock clicks -milliseconds]
     csynth_design
     set time_end [clock clicks -milliseconds]
@@ -195,7 +196,10 @@ if {$opt(cosim)} {
 
     if {$opt(fifo_opt)} {
         puts "\[hls4ml\] - FIFO optimization started"
-        add_vcd_instructions_tcl
+
+        if {[string equal "$backend" "vivado"] || [string equal $backend "vivadoaccelerator"]} {
+            add_vcd_instructions_tcl
+        }
     }
 
     remove_recursive_log_wave
diff --git a/hls4ml/templates/vivado/myproject_test.cpp b/hls4ml/templates/vivado/myproject_test.cpp
@@ -77,14 +77,16 @@ int main(int argc, char **argv) {
         fpr.close();
     } else {
         std::cout << "INFO: Unable to open input/predictions file, using default input." << std::endl;
+        const unsigned NUM_TEST_SAMPLES = 5;
+        for (unsigned i = 0; i < NUM_TEST_SAMPLES; i++) {
+            // hls-fpga-machine-learning insert zero
 
-        // hls-fpga-machine-learning insert zero
-
-        // hls-fpga-machine-learning insert top-level-function
+            // hls-fpga-machine-learning insert top-level-function
 
-        // hls-fpga-machine-learning insert output
+            // hls-fpga-machine-learning insert output
 
-        // hls-fpga-machine-learning insert tb-output
+            // hls-fpga-machine-learning insert tb-output
+        }
     }
 
     fout.close();
diff --git a/hls4ml/writer/vitis_writer.py b/hls4ml/writer/vitis_writer.py
@@ -1,5 +1,6 @@
 import glob
 import os
+from pathlib import Path
 from shutil import copy
 
 from hls4ml.writer.vivado_writer import VivadoWriter
@@ -24,10 +25,34 @@ def write_nnet_utils_overrides(self, model):
         for h in headers:
             copy(srcpath + h, dstpath + h)
 
+    def write_board_script_override(self, model):
+        '''
+        Write the tcl scripts and kernel sources to create a Vitis IPI
+        '''
+
+        ###################
+        # project.tcl
+        ###################
+
+        prj_tcl_file = Path(f'{model.config.get_output_dir()}/project.tcl')
+        with open(prj_tcl_file) as f:
+            prj_tcl_contents = f.readlines()
+            for line_num, line in enumerate(prj_tcl_contents):
+                if 'set backend' in line:
+                    prj_tcl_contents[line_num] = 'set backend "vitis"\n'
+                if 'set clock_uncertainty' in line:
+                    prj_tcl_contents[line_num] = 'set clock_uncertainty {}\n'.format(
+                        model.config.get_config_value('ClockUncertainty', '27%')
+                    )
+
+        with open(prj_tcl_file, 'w') as f:
+            f.writelines(prj_tcl_contents)
+
     def write_hls(self, model):
         """
         Write the HLS project. Calls the steps from VivadoWriter, adapted for Vitis
         """
         super().write_hls(model)
         self.write_nnet_utils_overrides(model)
+        self.write_board_script_override(model)
         self.write_tar(model)
diff --git a/hls4ml/writer/vivado_writer.py b/hls4ml/writer/vivado_writer.py
diff --git a/test/pytest/test_fifo_depth.py b/test/pytest/test_fifo_depth.py