Introduce Affine Controller Design#266
Conversation
|
How is the inner-loops start/end got updated? |
8da1fd0 to
4a7125c
Compare
Distributed counter units are updated through the cmds (including |
|
Shadow registers are used in the loop delivery mode in DCU to store and deliver the outer loop indexes. VectorCGRA/fu/single/LoopCounterRTL.py Line 134 in 61d7e56 They are updated by the affine controller through the VectorCGRA/fu/single/LoopCounterRTL.py Line 150 in 61d7e56
Only DCUs in loop count mode will send a VectorCGRA/fu/single/LoopCounterRTL.py Line 114 in 61d7e56 After receiving this complete signal, the affine controller will trigger outer loop counter increments and send |
Use wider TileInType = mk_bits(clog2(num_tile_inports + num_fu_inports + 1)) to match the updated mk_ctrl routing_xbar_outport field width.
- Renamed kAttrReadRegFrom to kAttrReadRegTowards in data_struct_attr.py - Changed field type from b1 to RegFromType (2-bit) in messages.py - Updated RegisterBankRTL.py to check read_reg_towards value: - 0: towards nothing (no read) - 1: towards FU (reg data consumed by operation) - 2: towards routing_xbar (reg data routed out to outport) - 3: towards both FU and routing_xbar - Updated RegisterClusterRTL.py to drive send_data_to_routing_crossbar.val based on read_reg_towards being 2 or 3 - Updated CtrlMemDynamicRTL.py field references - Updated all test files to use read_reg_towards with b2 type - Updated SV test files with new type names and field names This addresses the reviewer's request to reuse RegFromType for read_reg_towards to indicate whether FU expects inputs from inports or register.
…end_data_to_fu to send_data, simplify conditions
Add Affine Controller (AC) for Outer Loop Management
Summary
This PR introduces the Affine Controller (AC), a programmable hardware module that manages outer loop counters in the CGRA. While the existing LoopCounterRTL (DCU) handles innermost loop counting at the tile level, the AC coordinates multi-level loop nesting and cross-CGRA loop synchronization.
Architecture
The AC contains an array of Configurable Counter Units (CCUs), each representing one level of a loop nest. CCUs form a DAG topology where:
Each CCU tracks a loop variable (
lower_bound,upper_bound, step,current_value) and manages a set of targets — tile-array DCUs it must notify when advancing iterations.State Machine:
IDLE → RUNNING → DISPATCHING → RUNNING / COMPLETERUNNING: Waiting for child completion eventsDISPATCHING: Sending commands to targets (1 cycle per target)COMPLETE: Loop finished, parent notified internallyKey Design Decisions
CMD_RESET_LEAF_COUNTER(leaf-mode DCU) orCMD_UPDATE_COUNTER_SHADOW_VALUE(delivery-mode DCU). This avoids redundant messages and minimizes dispatch latency.received_complete_countin the same cycle — no external signaling needed.RUNNING, all child CCUs are automatically reset tolower_bound.current_value >= upper_bound, the CCU transitions directly toCOMPLETEwithout dispatching, preventing stale completion events from the previous iteration.RUNNINGstate.Cross-CGRA Support
CCU targets can be marked as remote (
is_remote=1). Dispatch commands for remote targets are sent viasend_to_remote(routed through the Controller's inter-CGRA NoC). Remote completion events arrive asCMD_AC_CHILD_COMPLETEonrecv_from_remote.Files Changed
NUM_CMDS28→40Test Cases
child_count=2). Verifies barrier synchronization.send_to_remote/recv_from_remotepaths.