Context
P3 offers two async lifting modes:
- Stackful: Each in-flight async call gets its own stack (like goroutines). Natural for async/await languages.
- Callback: Component exports a callback function, host calls it per event. No persistent stack per async call.
For embedded targets (Synth → ARM → Gale), callback mode is strongly preferred because SRAM is limited (often 256KB-512KB total). Allocating a separate stack per in-flight async call is prohibitive.
The cFS Pattern IS Callback Mode
cFS apps already follow the callback pattern:
while (RunLoop()) {
ReceiveBuffer(&msg, pipe, PEND_FOREVER); // wait for event
ProcessMessage(msg); // handle one message
// return to loop — no persistent state across messages
}
In P3 callback mode, this becomes:
// Component exports:
export __callback: func(event_type: i32, payload: i32) -> i32
// Host (Kiln/Gale) calls __callback when:
// - A subscribed stream has data (EVENT_STREAM_READ=2)
// - A future resolves (EVENT_FUTURE_READ=4)
// - A subtask completes (EVENT_SUBTASK=1)
One shared stack serves all callbacks for a component. The callback runs to completion, returns, and the stack is reused for the next event.
Implementation
Meld side (build time)
- When a component uses
(canon lift ... (callback $cb)), Meld generates the callback trampoline
- The trampoline dispatches events to the appropriate handler function
- State between callbacks is stored in linear memory (not on the stack)
Kiln side (runtime, std)
- Host maintains event queue per component
- Calls component's
__callback export for each event
- Single thread per component (no concurrent callbacks)
Gale side (runtime, embedded)
- Each component is a Gale task with a single stack
- Gale scheduler calls component's callback entry point
- Stack size = maximum depth of any single callback (deterministic, analyzable)
- WCET analysis possible: each callback has bounded execution time
Memory Savings
For a cFS system with 10 app components:
- Stackful mode: 10 × (base stack + max async depth × per-call stack) = potentially 10 × 16KB = 160KB of stacks
- Callback mode: 10 × (single shared stack) = 10 × 4KB = 40KB of stacks
- Savings: 75% stack memory reduction
On a 256KB SRAM Cortex-M4, this is the difference between fitting and not fitting.
Verification
- Prove callback dispatch is exhaustive (all event types handled)
- Prove no state leaks between callbacks (stack is clean on entry)
- WCET analysis: bound maximum callback execution time
- Stack depth analysis: bound maximum call depth within any callback
Connects to
- kiln#230: P3 async runtime
- meld#94: P3 async lowering (generates callback trampolines)
- gale#13: P3 extensions (scheduler drives callbacks)
- synth#80: ARM code for callback entry points
Priority
High — callback mode is the key enabler for P3 async on embedded targets.
Context
P3 offers two async lifting modes:
For embedded targets (Synth → ARM → Gale), callback mode is strongly preferred because SRAM is limited (often 256KB-512KB total). Allocating a separate stack per in-flight async call is prohibitive.
The cFS Pattern IS Callback Mode
cFS apps already follow the callback pattern:
In P3 callback mode, this becomes:
One shared stack serves all callbacks for a component. The callback runs to completion, returns, and the stack is reused for the next event.
Implementation
Meld side (build time)
(canon lift ... (callback $cb)), Meld generates the callback trampolineKiln side (runtime, std)
__callbackexport for each eventGale side (runtime, embedded)
Memory Savings
For a cFS system with 10 app components:
On a 256KB SRAM Cortex-M4, this is the difference between fitting and not fitting.
Verification
Connects to
Priority
High — callback mode is the key enabler for P3 async on embedded targets.