Skip to content

P3 callback lifting mode for embedded: single-stack component execution on Gale #232

@avrabe

Description

@avrabe

Context

P3 offers two async lifting modes:

  1. Stackful: Each in-flight async call gets its own stack (like goroutines). Natural for async/await languages.
  2. Callback: Component exports a callback function, host calls it per event. No persistent stack per async call.

For embedded targets (Synth → ARM → Gale), callback mode is strongly preferred because SRAM is limited (often 256KB-512KB total). Allocating a separate stack per in-flight async call is prohibitive.

The cFS Pattern IS Callback Mode

cFS apps already follow the callback pattern:

while (RunLoop()) {
    ReceiveBuffer(&msg, pipe, PEND_FOREVER);  // wait for event
    ProcessMessage(msg);                       // handle one message
    // return to loop — no persistent state across messages
}

In P3 callback mode, this becomes:

// Component exports:
export __callback: func(event_type: i32, payload: i32) -> i32

// Host (Kiln/Gale) calls __callback when:
// - A subscribed stream has data (EVENT_STREAM_READ=2)
// - A future resolves (EVENT_FUTURE_READ=4)
// - A subtask completes (EVENT_SUBTASK=1)

One shared stack serves all callbacks for a component. The callback runs to completion, returns, and the stack is reused for the next event.

Implementation

Meld side (build time)

  • When a component uses (canon lift ... (callback $cb)), Meld generates the callback trampoline
  • The trampoline dispatches events to the appropriate handler function
  • State between callbacks is stored in linear memory (not on the stack)

Kiln side (runtime, std)

  • Host maintains event queue per component
  • Calls component's __callback export for each event
  • Single thread per component (no concurrent callbacks)

Gale side (runtime, embedded)

  • Each component is a Gale task with a single stack
  • Gale scheduler calls component's callback entry point
  • Stack size = maximum depth of any single callback (deterministic, analyzable)
  • WCET analysis possible: each callback has bounded execution time

Memory Savings

For a cFS system with 10 app components:

  • Stackful mode: 10 × (base stack + max async depth × per-call stack) = potentially 10 × 16KB = 160KB of stacks
  • Callback mode: 10 × (single shared stack) = 10 × 4KB = 40KB of stacks
  • Savings: 75% stack memory reduction

On a 256KB SRAM Cortex-M4, this is the difference between fitting and not fitting.

Verification

  • Prove callback dispatch is exhaustive (all event types handled)
  • Prove no state leaks between callbacks (stack is clean on entry)
  • WCET analysis: bound maximum callback execution time
  • Stack depth analysis: bound maximum call depth within any callback

Connects to

  • kiln#230: P3 async runtime
  • meld#94: P3 async lowering (generates callback trampolines)
  • gale#13: P3 extensions (scheduler drives callbacks)
  • synth#80: ARM code for callback entry points

Priority

High — callback mode is the key enabler for P3 async on embedded targets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions