Skip to content

Introduce Affine Controller Design#266

Merged
ShangkunLi merged 49 commits intotancheng:masterfrom
ShangkunLi:affine-controller
Mar 20, 2026
Merged

Introduce Affine Controller Design#266
ShangkunLi merged 49 commits intotancheng:masterfrom
ShangkunLi:affine-controller

Conversation

@ShangkunLi
Copy link
Copy Markdown
Collaborator

Add Affine Controller (AC) for Outer Loop Management

Summary

This PR introduces the Affine Controller (AC), a programmable hardware module that manages outer loop counters in the CGRA. While the existing LoopCounterRTL (DCU) handles innermost loop counting at the tile level, the AC coordinates multi-level loop nesting and cross-CGRA loop synchronization.

Architecture

The AC contains an array of Configurable Counter Units (CCUs), each representing one level of a loop nest. CCUs form a DAG topology where:

  • Root CCUs drive the outermost loop with no parent
  • Regular CCUs have a local parent CCU and report completion upward

Each CCU tracks a loop variable (lower_bound, upper_bound, step, current_value) and manages a set of targets — tile-array DCUs it must notify when advancing iterations.

State Machine: IDLE → RUNNING → DISPATCHING → RUNNING / COMPLETE

  • RUNNING: Waiting for child completion events
  • DISPATCHING: Sending commands to targets (1 cycle per target)
  • COMPLETE: Loop finished, parent notified internally

Key Design Decisions

  • Single-phase dispatch: Each target receives exactly one command per dispatch — either CMD_RESET_LEAF_COUNTER (leaf-mode DCU) or CMD_UPDATE_COUNTER_SHADOW_VALUE (delivery-mode DCU). This avoids redundant messages and minimizes dispatch latency.
  • Internal CCU→parent completion: When a child CCU completes, it directly increments its parent's received_complete_count in the same cycle — no external signaling needed.
  • Automatic child reset: When a parent CCU finishes dispatching and returns to RUNNING, all child CCUs are automatically reset to lower_bound.
  • Last-iteration optimization: When current_value >= upper_bound, the CCU transitions directly to COMPLETE without dispatching, preventing stale completion events from the previous iteration.
  • Backpressure on events: Both tile and remote completion events are only consumed when a matching CCU is in RUNNING state.

Cross-CGRA Support

CCU targets can be marked as remote (is_remote=1). Dispatch commands for remote targets are sent via send_to_remote (routed through the Controller's inter-CGRA NoC). Remote completion events arrive as CMD_AC_CHILD_COMPLETE on recv_from_remote.

Files Changed

File Change
controller/AffineControllerRTL.py [NEW] AC implementation (~400 lines)
controller/test/AffineControllerRTL_test.py [NEW] 4 test cases
lib/cmd_type.py Added 11 new AC command types, NUM_CMDS 28→40

Test Cases

Test Description
test_basic_2_layer_loop 1 root CCU, 1 leaf DCU. Verifies basic dispatch and completion.
test_sibling_barrier 1 root CCU, 2 leaf DCUs (child_count=2). Verifies barrier synchronization.
test_3_layer_loop CCU[0]→CCU[1]→DCU chain. Verifies internal CCU completion, parent dispatch, and child reset across 2×3=6 inner iterations.
test_cross_cgra_2_layer_loop 1 root CCU with remote + local targets. Verifies send_to_remote / recv_from_remote paths.

@tancheng
Copy link
Copy Markdown
Owner

How is the inner-loops start/end got updated?

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

How is the inner-loops start/end got updated?

Distributed counter units are updated through the cmds (including CMD_UPDATE_COUNTER_SHADOW_VALUE, CMD_RESET_LEAF_COUNTER)

@tancheng
Copy link
Copy Markdown
Owner

How is the inner-loops start/end got updated?

Distributed counter units are updated through the cmds (including CMD_UPDATE_COUNTER_SHADOW_VALUE, CMD_RESET_LEAF_COUNTER)

  • What does "shadow" mean here?
  • How do we make sure the inner-loops are already done with their execution before sending out the updated values/cmd?

@ShangkunLi
Copy link
Copy Markdown
Collaborator Author

ShangkunLi commented Feb 25, 2026

How is the inner-loops start/end got updated?

Distributed counter units are updated through the cmds (including CMD_UPDATE_COUNTER_SHADOW_VALUE, CMD_RESET_LEAF_COUNTER)

  • What does "shadow" mean here?

Shadow registers are used in the loop delivery mode in DCU to store and deliver the outer loop indexes.

s.send_out[0].msg @= s.shadow_regs[addr]

They are updated by the affine controller through the CMD_UPDATE_COUNTER_SHADOW_VALUE command.

elif s.recv_from_ctrl_mem.msg.cmd == CMD_UPDATE_COUNTER_SHADOW_VALUE:

  • How do we make sure the inner-loops are already done with their execution before sending out the updated values/cmd?

Only DCUs in loop count mode will send a CMD_LEAF_COUNTER_COMPLETE command to the affine controller when it reaches the upper bound.

CMD_LEAF_COUNTER_COMPLETE, s.DataType(0, 0), 0, s.recv_opt.msg, addr

After receiving this complete signal, the affine controller will trigger outer loop counter increments and send CMD_UPDATE_COUNTER_SHADOW_VALUE & CMD_RESET_LEAF_COUNTER accordingly.

@tancheng tancheng requested a review from rp15 February 25, 2026 15:13
Jackcuii and others added 29 commits March 19, 2026 14:12
Use wider TileInType = mk_bits(clog2(num_tile_inports + num_fu_inports + 1))
to match the updated mk_ctrl routing_xbar_outport field width.
- Renamed kAttrReadRegFrom to kAttrReadRegTowards in data_struct_attr.py
- Changed field type from b1 to RegFromType (2-bit) in messages.py
- Updated RegisterBankRTL.py to check read_reg_towards value:
  - 0: towards nothing (no read)
  - 1: towards FU (reg data consumed by operation)
  - 2: towards routing_xbar (reg data routed out to outport)
  - 3: towards both FU and routing_xbar
- Updated RegisterClusterRTL.py to drive send_data_to_routing_crossbar.val
  based on read_reg_towards being 2 or 3
- Updated CtrlMemDynamicRTL.py field references
- Updated all test files to use read_reg_towards with b2 type
- Updated SV test files with new type names and field names

This addresses the reviewer's request to reuse RegFromType for
read_reg_towards to indicate whether FU expects inputs from inports
or register.
…end_data_to_fu to send_data, simplify conditions
@ShangkunLi ShangkunLi merged commit f6d28d5 into tancheng:master Mar 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants