P3 callback lifting mode for embedded: single-stack component execution on Gale

## Context

P3 offers two async lifting modes:
1. **Stackful**: Each in-flight async call gets its own stack (like goroutines). Natural for async/await languages.
2. **Callback**: Component exports a callback function, host calls it per event. No persistent stack per async call.

For embedded targets (Synth → ARM → Gale), callback mode is strongly preferred because SRAM is limited (often 256KB-512KB total). Allocating a separate stack per in-flight async call is prohibitive.

## The cFS Pattern IS Callback Mode

cFS apps already follow the callback pattern:
```c
while (RunLoop()) {
    ReceiveBuffer(&msg, pipe, PEND_FOREVER);  // wait for event
    ProcessMessage(msg);                       // handle one message
    // return to loop — no persistent state across messages
}
```

In P3 callback mode, this becomes:
```
// Component exports:
export __callback: func(event_type: i32, payload: i32) -> i32

// Host (Kiln/Gale) calls __callback when:
// - A subscribed stream has data (EVENT_STREAM_READ=2)
// - A future resolves (EVENT_FUTURE_READ=4)
// - A subtask completes (EVENT_SUBTASK=1)
```

**One shared stack** serves all callbacks for a component. The callback runs to completion, returns, and the stack is reused for the next event.

## Implementation

### Meld side (build time)
- When a component uses `(canon lift ... (callback $cb))`, Meld generates the callback trampoline
- The trampoline dispatches events to the appropriate handler function
- State between callbacks is stored in linear memory (not on the stack)

### Kiln side (runtime, std)
- Host maintains event queue per component
- Calls component's `__callback` export for each event
- Single thread per component (no concurrent callbacks)

### Gale side (runtime, embedded)
- Each component is a Gale task with a single stack
- Gale scheduler calls component's callback entry point
- Stack size = maximum depth of any single callback (deterministic, analyzable)
- WCET analysis possible: each callback has bounded execution time

### Memory Savings

For a cFS system with 10 app components:
- **Stackful mode**: 10 × (base stack + max async depth × per-call stack) = potentially 10 × 16KB = 160KB of stacks
- **Callback mode**: 10 × (single shared stack) = 10 × 4KB = 40KB of stacks
- **Savings**: 75% stack memory reduction

On a 256KB SRAM Cortex-M4, this is the difference between fitting and not fitting.

## Verification

- Prove callback dispatch is exhaustive (all event types handled)
- Prove no state leaks between callbacks (stack is clean on entry)
- WCET analysis: bound maximum callback execution time
- Stack depth analysis: bound maximum call depth within any callback

## Connects to

- kiln#230: P3 async runtime
- meld#94: P3 async lowering (generates callback trampolines)
- gale#13: P3 extensions (scheduler drives callbacks)
- synth#80: ARM code for callback entry points

## Priority

High — callback mode is the key enabler for P3 async on embedded targets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P3 callback lifting mode for embedded: single-stack component execution on Gale #232

Context

The cFS Pattern IS Callback Mode

Implementation

Meld side (build time)

Kiln side (runtime, std)

Gale side (runtime, embedded)

Memory Savings

Verification

Connects to

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

P3 callback lifting mode for embedded: single-stack component execution on Gale #232

Description

Context

The cFS Pattern IS Callback Mode

Implementation

Meld side (build time)

Kiln side (runtime, std)

Gale side (runtime, embedded)

Memory Savings

Verification

Connects to

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions