Skip to content

Conversation

@colluca
Copy link
Contributor

@colluca colluca commented Sep 22, 2025

This PR adds multicast support to the AXI XBAR, such that write requests from a master can be multicast to multiple slaves.

To support multicast with low area overhead, we adopt the mask-based encoding proposed in https://arxiv.org/pdf/2502.19215.

In detail, this PR implements the following changes:

  • Adds the any_outstanding_trx_o port to the axi_demux_id_counters module.
  • The axi_mcast_demux_simple implements a superset of the functionality of the axi_demux_simple, including all multicast capabilities, which require a few more signals on the interface, namely mst_is_mcast_o and mst_aw_commit_o. So axi_demux_simple is rewritten as a wrapper around axi_mcast_demux_simple, tying off unused ports and parameters.
  • The same is done for the axi_mcast_mux and axi_mux modules, the axi_mcast_xbar_unmuxed and axi_xbar_unmuxed modules, and the axi_mcast_xbar and axi_xbar modules.
  • The axi_demux_id_counters module is moved to its own file, which is anyways good practice, and it would otherwise not be clear whether it should now belong to axi_demux_simple.sv or axi_mcast_demux_simple.sv.
  • Placing the spill registers downstream of the address decoders requires registering the ax_select signals from the address decoders. To eliminate this overhead (in the case of multicast it would even increase due to additional output signals from the decoders), we move the address decoders downstream of the spill registers in theaxi_mcast_demux_mapped module. The _mapped prefix is to identify a demux whose slaves are memory-mapped, i.e. that uses address decoders to generate the ax_select signals. Since several modules use the demux with a non-address-based ax_select signal generation, the axi_demux module preserves the same interface and functionality as before, and does not wrap the axi_mcast_demux_mapped module, unlike the other unicast and multicast IP pairs.

Auxiliary changes:

  • The axi_demux_id_counters is moved to its own file, as good practice.
  • Update IP levels in Bender.yml after IP refactoring (including fixes to some dependencies that were not previously tracked, e.g. axi_demux_simple->axi_burst_splitter_gran, and others)

Additional notes:

@colluca colluca force-pushed the multicast branch 5 times, most recently from d7b7105 to 8e04779 Compare September 26, 2025 17:48
@colluca colluca marked this pull request as ready for review October 6, 2025 06:58
Copy link
Member

@micprog micprog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite cumbersome to review, as most of the 5k line changes are code movement changes, and the code changes are not directly visible in the interface here. For some files, I added the corresponding git commands I used to check the changes. It might make sense to move some of these changes not related to multicast to dedicated PRs, simplifying the review and traceability of changes.

A further complication in reviewing this PR is the non-standard feature, as multicast is not supported in any available AXI specification. Please add documentation in the corresponding README files for what multicast is supported, how it works, and how it is intended to be used.

Also, given that this is not part of the spec, it might make sense to put this in a separate project, keeping this axi repository to what is available in the spec, but we can have that discussion offline.

Primarily, I was under the impression that this PR does not change the existing functionality when multicast is disabled or the existing modules are used, but this does not seem to be the case!

Along with comments placed to help understanding, there are a few other items that need to be addressed. Please see below.


dependencies:
common_cells: { git: "https://github.com/pulp-platform/common_cells.git", version: 1.37.0 }
common_cells: { git: "https://github.com/pulp-platform/common_cells.git", rev: "multicast-xbar" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this to use a proper version, not a branch name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is waiting on pulp-platform/common_cells#235 (>1y...).

It's been marked as v2, so it can't be accelerated much, but this should be hopefully sorted out soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a priority to only use mainline releases for integration in main. This significantly improves compatibility in projects when there are multiple references to common_cells (and also axi). Therefore, we should wait for this to be integrated before merging this PR.

Comment on lines +42 to +43
// outstanding transactions
output logic any_outstanding_trx_o
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand correctly, the only adjustment you made to this module is adding this port, otherwise it is copied from axi_demux_simple.sv.

git diff master:src/axi_demux_simple.sv multicast:src/axi_demux_id_counters.sv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct yes, I've added it to the PR description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Maybe it makes sense to split this off as an individual PR for traceability...

);

// Account for additional error slave
localparam int unsigned NoMstPortsExt = NoMstPorts + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an error slave needed when a default port is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it isn't, since when the default port is enabled the address decoder does not produce a decoding error.

I just kept the previous behaviour here, and I think it also makes sense, since the default port enable is a dynamic signal and not a compile-time parameter.

Comment on lines +113 to +125
spill_register #(
.T ( aw_chan_t ),
.Bypass ( ~SpillAw )
) i_aw_spill_reg (
.clk_i,
.rst_ni,
.valid_i ( slv_req_i.aw_valid ),
.ready_o ( slv_resp_o.aw_ready ),
.data_i ( slv_req_i.aw ),
.valid_o ( slv_req_cut.aw_valid ),
.ready_i ( slv_resp_cut.aw_ready ),
.data_o ( slv_req_cut.aw )
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module has a different timing behavior than the existing implementation, as outlined in the PR description. While this simplifies some things and reduces the number of registers, the address decoding can add quite a bit of timing overhead and can now no longer be pipelined out of the demux path when using any xbar configuration. Do you have an analysis of this timing impact before we integrate this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a relevant observation.

I did not do a complete AT plot analysis, and I only evaluated this as an incremental change over the multicast design (not the original unicast design), but I did do some synthesis trials at 1 GHz in GF12 quite a long time ago now.

When constrained at 1 GHz, there was no significant frequency degradation (critical path went from 0.996 to 0.993 ns, most likely tool noise). The levels of logic reported by Fusion Compiler, which increased from 30 to 32, indicate that there might be a small timing degradation if pushed to higher frequencies.

But, as mentioned, we also save spill register area, so I expect that the AT plots of the unicast design with address decoders upstream of the spill registers vs. downstream would probably look pretty much the same.

I forwarded a few old WRs where I reported these results to you for further details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about increasing the levels of logic. In a separate paper I did, the decoding logic accounted for quite a bit of timing overhead. Creating a configuration like the one you have now is fairly straightforward with the previous IPs (i.e., simply adding an axi_cut module before each xbar port), but the previous implementation is now almost impossible to access. So while your configuration might not change, others might be impacted, which worries me.

dec_aw_addr = '0;
dec_aw_mask = '0;

if (slv_req_cut.aw.user.collective_mask == '0) begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will work for all tools if the custom user field is not present in the passed struct...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the syntax in the following example is a bit different, but still uses some fields which are conditionally present in a struct type:
https://github.com/pulp-platform/snitch_cluster/blob/6248ae825d821dca81c7b9c0f3cf3fd59c0ecdaa/hw/snitch_cluster/src/snitch_cluster.sv#L1409-L1430
https://github.com/pulp-platform/snitch_cluster/blob/6248ae825d821dca81c7b9c0f3cf3fd59c0ecdaa/hw/snitch_cluster/src/snitch_cluster_pkg.sv.tpl#L114-L125

In Snitch, this code is tested and works correctly with QuestaSim, VCS, Verilator, Fusion Compiler and Yosys.

The important thing AFAIK is that the code is guarded by a generate construct, which is the case also here: NoMulticastPorts > 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see the code is guarded by a generate (line 257), which I missed previously. The second generate (line 269) is confusing in this regard.

else $fatal(1, "slv_ar_select_i illegal while ar_valid.");
w_underflow: assert property( @(posedge clk_i)
((w_open == '0) && (w_cnt_up ^ w_cnt_down) |-> !w_cnt_down)) else
$fatal(1, "W counter underflowed!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the following assertions that you removed fully no longer applicable, or is there a version we could keep?

-    aw_select: assume property( @(posedge clk_i) (slv_req_i.aw_valid |->
-                                                 (slv_aw_select_i < NoMstPorts))) else
-      $fatal(1, "slv_aw_select_i is %d: AW has selected a slave that is not defined.\
-                 NoMstPorts: %d", slv_aw_select_i, NoMstPorts);
-    internal_aw_select: assert property( @(posedge clk_i)
-        (aw_valid |-> slv_aw_select_i < NoMstPorts))
-      else $fatal(1, "slv_aw_select_i illegal while aw_valid.");

Copy link
Contributor Author

@colluca colluca Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are no longer applicable.

When aw_select is an index then it makes sense to check that it's within bounds. For example, if NoMstPorts is 3 and aw_select is a 2-bit signal, it could encode 3 which is out of bounds.

But when aw_select is a bitmask it no longer makes sense, since the number of bits is equal to the number of NoMstPorts so there is no risk of encoding illegal values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe these assertions should stick around in the IP still using index bits, namely axi_demux_simple?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants