Skip to content

perf: reuse unchanged viewport rows in boo ui#58

Merged
kylecarbs merged 2 commits into
mainfrom
perf/ui-viewport-row-cache
Jun 13, 2026
Merged

perf: reuse unchanged viewport rows in boo ui#58
kylecarbs merged 2 commits into
mainfrom
perf/ui-viewport-row-cache

Conversation

@kylecarbs

Copy link
Copy Markdown
Member

What

In boo ui, every ~15ms frame re-serialized every viewport row through libghostty's ScreenFormatter (allocating a fresh writer per row), then diffed the result against the row cache and discarded unchanged rows. That is O(rows) VT serializations + O(rows) heap allocations per frame even when a single cell changed, so the UI does much more work than a plain attach and feels slower than raw SSH, especially for interactive/localized updates (typing echo, progress bars, status lines).

Changes

  • appendTermRow now formats straight into the caller's buffer via Allocating.fromArrayList, reusing capacity instead of allocating a writer per row.
  • A per-row viewport cache stores each row's serialized bytes and reuses them when libghostty reports the row unchanged. A row is reused only when its row identity (page node + offset) and its dirty bit are both unchanged; the dirty bits are then cleared once per frame. boo never used libghostty's dirty tracking before this.

Why identity, not just the dirty bit

Scrolling the active screen relocates a visual row onto a different libghostty row while the row itself stays clean. Probe after a single scroll on a settled screen:

row 0: dirty=false moved=true   <- clean but relocated
row 1: dirty=false moved=true
row 2: dirty=true  moved=true
row 3: dirty=true  moved=true

A dirty-only cache would reuse stale bytes for rows 0-1. Keying reuse on identity + dirty fixes that. Supporting invariants: the existing full-row row_cache still gates what is actually written; the cursor and text selection are drawn as separate overlays so they do not invalidate the base cache; and every viewport-structural change (active scroll, browse-scroll, resize, view switch, C-a l) already forces a full repaint.

Benchmark

zig build bench (bench/render.zig, 50x200, c_allocator, same allocator boo uses):

frame ns/frame allocs/frame
status quo (all rows, per-row alloc) 80267 150
reused buffer (all rows) 76639 0
1 changed row, rest cached 1760 0

Localized-update frames drop ~46x (80.3us -> 1.8us); full repaints (bulk scroll, TUI redraw) keep the same serialization cost minus the 150 allocs/frame.

Testing

  • zig build test, zig build test-integration, and zig build test-all -Doptimize=ReleaseSafe all pass; zig fmt --check clean.
  • New unit test viewportRowReusable re-serializes a clean row that scrolled away pins the identity contract and fails if the identity check is removed (verified).
  • New integration test ui: scrolling output keeps the viewport in sync with the session drives heavy scrolling through a real PTY and asserts the rendered tail is strictly monotonic (a stale reused row would break ordering).

Out of scope

The ~15ms render coalescing interval and the architectural double VT parse (daemon + client) are left untouched.

Implementation plan & decision log

Root cause (src/ui.zig composeFrame)

Every frame, for every viewport row, appendTermRow runs the ScreenFormatter to re-serialize the row, allocating a fresh Allocating writer, then the bytes are compared to row_cache and discarded if unchanged. PageList.clearDirty was never called, so dirty tracking was unused.

Approach (two independent wins)

  1. Kill the per-row allocation: format directly into the caller's buffer with Allocating.fromArrayList.
  2. Skip re-serializing unchanged rows using libghostty dirty bits + a per-row serialized-bytes cache; reuse iff !full_render && live && identity matches && !pin.isDirty(); clearDirty() once per frame.

Why it is correct

  • Pin.isDirty() is page.dirty or row.dirty. Clearing dirty each frame makes "clean this frame" mean "unchanged since last frame", chaining back to the last serialization.
  • Active-screen scroll-at-bottom is not a full_render trigger; it is caught by the pin-identity check (each visual row maps to a new (node, offset)). In-place edits set row.dirty (Screen.zig cursor write / clears).
  • Selection and cursor are separate overlays; the full-row row_cache still gates writes.
  • boo is single-threaded: output accumulates dirty bits across loop iterations and clearDirty() runs synchronously after compose, so no dirty bit is dropped.

Risks

  • Relies on libghostty marking every in-place visible mutation dirty (its own renderer depends on this); integration tests validate. Scrollback page pruning during heavy streaming does not affect active-screen row nodes, and resize forces a full repaint.

Generated by Coder Agents on behalf of @kylecarbs.

In `boo ui` every ~15ms frame re-serialized every viewport row through
libghostty's ScreenFormatter, allocating a fresh writer per row, then
diffed the result against the row cache and discarded unchanged rows.
The per-frame cost was O(rows) VT serializations plus O(rows) heap
allocations even when a single cell changed, so the UI did far more work
than a plain attach and felt slower than raw SSH.

Two changes:

- `appendTermRow` now formats straight into the caller's buffer via
  `Allocating.fromArrayList`, reusing capacity instead of allocating a
  writer per row.
- A per-row viewport cache stores each row's serialized bytes and reuses
  them when libghostty reports the row unchanged. A row is reused only
  when its row identity (page node and offset) and its dirty bit are
  both unchanged, then the dirty bits are cleared once per frame.

The identity check is required, not just the dirty bit: scrolling the
active screen relocates a visual row onto a different libghostty row
while the row itself stays clean, so a dirty-only cache would reuse
stale bytes. A unit test pins this contract and an integration test
covers heavy scrolling end to end.

Microbenchmark (bench/render.zig, 50x200, c_allocator):

  status quo, all rows:        80267 ns/frame, 150 allocs/frame
  reused buffer, all rows:     76639 ns/frame,   0 allocs/frame
  1 changed row, rest cached:   1760 ns/frame

Localized updates (typing echo, progress bars, status lines) drop ~46x
per frame; full repaints keep the same serialization cost minus the
per-row allocations.
waitFor("DONE-MARK") could match the echoed command line, which contains
the literal marker, before any loop output rendered. On macOS the wait
returned immediately and the screen held no LINE-N yet, so the monotonic
check saw nothing (expected 200, found -1). Wait on "LINE-200" instead,
which the command does not contain literally, so the wait only matches
real output.
@kylecarbs kylecarbs merged commit 42c880f into main Jun 13, 2026
5 checks passed
@kylecarbs kylecarbs deleted the perf/ui-viewport-row-cache branch June 13, 2026 20:23
@kylecarbs kylecarbs mentioned this pull request Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant