-
Notifications
You must be signed in to change notification settings - Fork 337
Add multicast support to XBAR #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d7b7105 to
8e04779
Compare
micprog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is quite cumbersome to review, as most of the 5k line changes are code movement changes, and the code changes are not directly visible in the interface here. For some files, I added the corresponding git commands I used to check the changes. It might make sense to move some of these changes not related to multicast to dedicated PRs, simplifying the review and traceability of changes.
A further complication in reviewing this PR is the non-standard feature, as multicast is not supported in any available AXI specification. Please add documentation in the corresponding README files for what multicast is supported, how it works, and how it is intended to be used.
Also, given that this is not part of the spec, it might make sense to put this in a separate project, keeping this axi repository to what is available in the spec, but we can have that discussion offline.
Primarily, I was under the impression that this PR does not change the existing functionality when multicast is disabled or the existing modules are used, but this does not seem to be the case!
Along with comments placed to help understanding, there are a few other items that need to be addressed. Please see below.
|
|
||
| dependencies: | ||
| common_cells: { git: "https://github.com/pulp-platform/common_cells.git", version: 1.37.0 } | ||
| common_cells: { git: "https://github.com/pulp-platform/common_cells.git", rev: "multicast-xbar" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this to use a proper version, not a branch name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is waiting on pulp-platform/common_cells#235 (>1y...).
It's been marked as v2, so it can't be accelerated much, but this should be hopefully sorted out soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it a priority to only use mainline releases for integration in main. This significantly improves compatibility in projects when there are multiple references to common_cells (and also axi). Therefore, we should wait for this to be integrated before merging this PR.
| // outstanding transactions | ||
| output logic any_outstanding_trx_o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand correctly, the only adjustment you made to this module is adding this port, otherwise it is copied from axi_demux_simple.sv.
git diff master:src/axi_demux_simple.sv multicast:src/axi_demux_id_counters.sv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct yes, I've added it to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Maybe it makes sense to split this off as an individual PR for traceability...
| ); | ||
|
|
||
| // Account for additional error slave | ||
| localparam int unsigned NoMstPortsExt = NoMstPorts + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an error slave needed when a default port is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it isn't, since when the default port is enabled the address decoder does not produce a decoding error.
I just kept the previous behaviour here, and I think it also makes sense, since the default port enable is a dynamic signal and not a compile-time parameter.
| spill_register #( | ||
| .T ( aw_chan_t ), | ||
| .Bypass ( ~SpillAw ) | ||
| ) i_aw_spill_reg ( | ||
| .clk_i, | ||
| .rst_ni, | ||
| .valid_i ( slv_req_i.aw_valid ), | ||
| .ready_o ( slv_resp_o.aw_ready ), | ||
| .data_i ( slv_req_i.aw ), | ||
| .valid_o ( slv_req_cut.aw_valid ), | ||
| .ready_i ( slv_resp_cut.aw_ready ), | ||
| .data_o ( slv_req_cut.aw ) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module has a different timing behavior than the existing implementation, as outlined in the PR description. While this simplifies some things and reduces the number of registers, the address decoding can add quite a bit of timing overhead and can now no longer be pipelined out of the demux path when using any xbar configuration. Do you have an analysis of this timing impact before we integrate this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a relevant observation.
I did not do a complete AT plot analysis, and I only evaluated this as an incremental change over the multicast design (not the original unicast design), but I did do some synthesis trials at 1 GHz in GF12 quite a long time ago now.
When constrained at 1 GHz, there was no significant frequency degradation (critical path went from 0.996 to 0.993 ns, most likely tool noise). The levels of logic reported by Fusion Compiler, which increased from 30 to 32, indicate that there might be a small timing degradation if pushed to higher frequencies.
But, as mentioned, we also save spill register area, so I expect that the AT plots of the unicast design with address decoders upstream of the spill registers vs. downstream would probably look pretty much the same.
I forwarded a few old WRs where I reported these results to you for further details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about increasing the levels of logic. In a separate paper I did, the decoding logic accounted for quite a bit of timing overhead. Creating a configuration like the one you have now is fairly straightforward with the previous IPs (i.e., simply adding an axi_cut module before each xbar port), but the previous implementation is now almost impossible to access. So while your configuration might not change, others might be impacted, which worries me.
| dec_aw_addr = '0; | ||
| dec_aw_mask = '0; | ||
|
|
||
| if (slv_req_cut.aw.user.collective_mask == '0) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will work for all tools if the custom user field is not present in the passed struct...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the syntax in the following example is a bit different, but still uses some fields which are conditionally present in a struct type:
https://github.com/pulp-platform/snitch_cluster/blob/6248ae825d821dca81c7b9c0f3cf3fd59c0ecdaa/hw/snitch_cluster/src/snitch_cluster.sv#L1409-L1430
https://github.com/pulp-platform/snitch_cluster/blob/6248ae825d821dca81c7b9c0f3cf3fd59c0ecdaa/hw/snitch_cluster/src/snitch_cluster_pkg.sv.tpl#L114-L125
In Snitch, this code is tested and works correctly with QuestaSim, VCS, Verilator, Fusion Compiler and Yosys.
The important thing AFAIK is that the code is guarded by a generate construct, which is the case also here: NoMulticastPorts > 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see the code is guarded by a generate (line 257), which I missed previously. The second generate (line 269) is confusing in this regard.
| else $fatal(1, "slv_ar_select_i illegal while ar_valid."); | ||
| w_underflow: assert property( @(posedge clk_i) | ||
| ((w_open == '0) && (w_cnt_up ^ w_cnt_down) |-> !w_cnt_down)) else | ||
| $fatal(1, "W counter underflowed!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the following assertions that you removed fully no longer applicable, or is there a version we could keep?
- aw_select: assume property( @(posedge clk_i) (slv_req_i.aw_valid |->
- (slv_aw_select_i < NoMstPorts))) else
- $fatal(1, "slv_aw_select_i is %d: AW has selected a slave that is not defined.\
- NoMstPorts: %d", slv_aw_select_i, NoMstPorts);
- internal_aw_select: assert property( @(posedge clk_i)
- (aw_valid |-> slv_aw_select_i < NoMstPorts))
- else $fatal(1, "slv_aw_select_i illegal while aw_valid.");There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are no longer applicable.
When aw_select is an index then it makes sense to check that it's within bounds. For example, if NoMstPorts is 3 and aw_select is a 2-bit signal, it could encode 3 which is out of bounds.
But when aw_select is a bitmask it no longer makes sense, since the number of bits is equal to the number of NoMstPorts so there is no risk of encoding illegal values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe these assertions should stick around in the IP still using index bits, namely axi_demux_simple?
To be improved: - 50% multicast AW throughput - Potential for combinational loops in mux
Multiple outstanding multicast transactions are allowed only if the slaves selected by these transactions are the same.
--------- Co-authored-by: Luca Colagrande <luca.colagrande3@gmail.com> Co-authored-by: Raphael <raroth@student.ethz.ch>
This PR adds multicast support to the AXI XBAR, such that write requests from a master can be multicast to multiple slaves.
To support multicast with low area overhead, we adopt the mask-based encoding proposed in https://arxiv.org/pdf/2502.19215.
In detail, this PR implements the following changes:
any_outstanding_trx_oport to theaxi_demux_id_countersmodule.axi_mcast_demux_simpleimplements a superset of the functionality of theaxi_demux_simple, including all multicast capabilities, which require a few more signals on the interface, namelymst_is_mcast_oandmst_aw_commit_o. Soaxi_demux_simpleis rewritten as a wrapper aroundaxi_mcast_demux_simple, tying off unused ports and parameters.axi_mcast_muxandaxi_muxmodules, theaxi_mcast_xbar_unmuxedandaxi_xbar_unmuxedmodules, and theaxi_mcast_xbarandaxi_xbarmodules.axi_demux_id_countersmodule is moved to its own file, which is anyways good practice, and it would otherwise not be clear whether it should now belong toaxi_demux_simple.svoraxi_mcast_demux_simple.sv.ax_selectsignals from the address decoders. To eliminate this overhead (in the case of multicast it would even increase due to additional output signals from the decoders), we move the address decoders downstream of the spill registers in theaxi_mcast_demux_mappedmodule. The_mappedprefix is to identify a demux whose slaves are memory-mapped, i.e. that uses address decoders to generate theax_selectsignals. Since several modules use the demux with a non-address-basedax_selectsignal generation, theaxi_demuxmodule preserves the same interface and functionality as before, and does not wrap theaxi_mcast_demux_mappedmodule, unlike the other unicast and multicast IP pairs.Auxiliary changes:
axi_demux_id_countersis moved to its own file, as good practice.Bender.ymlafter IP refactoring (including fixes to some dependencies that were not previously tracked, e.g.axi_demux_simple->axi_burst_splitter_gran, and others)Additional notes:
Details on how the default master port rule is used are in the
multiaddr_decodemodule (see alsomultiaddr_decode: Add default port logic common_cells#235).