perf: reuse unchanged viewport rows in boo ui by kylecarbs · Pull Request #58 · coder/boo

kylecarbs · 2026-06-13T20:07:21Z

What

In boo ui, every ~15ms frame re-serialized every viewport row through libghostty's ScreenFormatter (allocating a fresh writer per row), then diffed the result against the row cache and discarded unchanged rows. That is O(rows) VT serializations + O(rows) heap allocations per frame even when a single cell changed, so the UI does much more work than a plain attach and feels slower than raw SSH, especially for interactive/localized updates (typing echo, progress bars, status lines).

Changes

appendTermRow now formats straight into the caller's buffer via Allocating.fromArrayList, reusing capacity instead of allocating a writer per row.
A per-row viewport cache stores each row's serialized bytes and reuses them when libghostty reports the row unchanged. A row is reused only when its row identity (page node + offset) and its dirty bit are both unchanged; the dirty bits are then cleared once per frame. boo never used libghostty's dirty tracking before this.

Why identity, not just the dirty bit

Scrolling the active screen relocates a visual row onto a different libghostty row while the row itself stays clean. Probe after a single scroll on a settled screen:

row 0: dirty=false moved=true   <- clean but relocated
row 1: dirty=false moved=true
row 2: dirty=true  moved=true
row 3: dirty=true  moved=true

A dirty-only cache would reuse stale bytes for rows 0-1. Keying reuse on identity + dirty fixes that. Supporting invariants: the existing full-row row_cache still gates what is actually written; the cursor and text selection are drawn as separate overlays so they do not invalidate the base cache; and every viewport-structural change (active scroll, browse-scroll, resize, view switch, C-a l) already forces a full repaint.

Benchmark

zig build bench (bench/render.zig, 50x200, c_allocator, same allocator boo uses):

frame	ns/frame	allocs/frame
status quo (all rows, per-row alloc)	80267	150
reused buffer (all rows)	76639	0
1 changed row, rest cached	1760	0

Localized-update frames drop ~46x (80.3us -> 1.8us); full repaints (bulk scroll, TUI redraw) keep the same serialization cost minus the 150 allocs/frame.

Testing

zig build test, zig build test-integration, and zig build test-all -Doptimize=ReleaseSafe all pass; zig fmt --check clean.
New unit test viewportRowReusable re-serializes a clean row that scrolled away pins the identity contract and fails if the identity check is removed (verified).
New integration test ui: scrolling output keeps the viewport in sync with the session drives heavy scrolling through a real PTY and asserts the rendered tail is strictly monotonic (a stale reused row would break ordering).

Out of scope

The ~15ms render coalescing interval and the architectural double VT parse (daemon + client) are left untouched.

Implementation plan & decision log

Root cause (`src/ui.zig` `composeFrame`)

Every frame, for every viewport row, appendTermRow runs the ScreenFormatter to re-serialize the row, allocating a fresh Allocating writer, then the bytes are compared to row_cache and discarded if unchanged. PageList.clearDirty was never called, so dirty tracking was unused.

Approach (two independent wins)

Kill the per-row allocation: format directly into the caller's buffer with Allocating.fromArrayList.
Skip re-serializing unchanged rows using libghostty dirty bits + a per-row serialized-bytes cache; reuse iff !full_render && live && identity matches && !pin.isDirty(); clearDirty() once per frame.

Why it is correct

Pin.isDirty() is page.dirty or row.dirty. Clearing dirty each frame makes "clean this frame" mean "unchanged since last frame", chaining back to the last serialization.
Active-screen scroll-at-bottom is not a full_render trigger; it is caught by the pin-identity check (each visual row maps to a new (node, offset)). In-place edits set row.dirty (Screen.zig cursor write / clears).
Selection and cursor are separate overlays; the full-row row_cache still gates writes.
boo is single-threaded: output accumulates dirty bits across loop iterations and clearDirty() runs synchronously after compose, so no dirty bit is dropped.

Risks

Relies on libghostty marking every in-place visible mutation dirty (its own renderer depends on this); integration tests validate. Scrollback page pruning during heavy streaming does not affect active-screen row nodes, and resize forces a full repaint.

Generated by Coder Agents on behalf of @kylecarbs.

In `boo ui` every ~15ms frame re-serialized every viewport row through libghostty's ScreenFormatter, allocating a fresh writer per row, then diffed the result against the row cache and discarded unchanged rows. The per-frame cost was O(rows) VT serializations plus O(rows) heap allocations even when a single cell changed, so the UI did far more work than a plain attach and felt slower than raw SSH. Two changes: - `appendTermRow` now formats straight into the caller's buffer via `Allocating.fromArrayList`, reusing capacity instead of allocating a writer per row. - A per-row viewport cache stores each row's serialized bytes and reuses them when libghostty reports the row unchanged. A row is reused only when its row identity (page node and offset) and its dirty bit are both unchanged, then the dirty bits are cleared once per frame. The identity check is required, not just the dirty bit: scrolling the active screen relocates a visual row onto a different libghostty row while the row itself stays clean, so a dirty-only cache would reuse stale bytes. A unit test pins this contract and an integration test covers heavy scrolling end to end. Microbenchmark (bench/render.zig, 50x200, c_allocator): status quo, all rows: 80267 ns/frame, 150 allocs/frame reused buffer, all rows: 76639 ns/frame, 0 allocs/frame 1 changed row, rest cached: 1760 ns/frame Localized updates (typing echo, progress bars, status lines) drop ~46x per frame; full repaints keep the same serialization cost minus the per-row allocations.

waitFor("DONE-MARK") could match the echoed command line, which contains the literal marker, before any loop output rendered. On macOS the wait returned immediately and the screen held no LINE-N yet, so the monotonic check saw nothing (expected 200, found -1). Wait on "LINE-200" instead, which the command does not contain literally, so the wait only matches real output.

kylecarbs added 2 commits June 13, 2026 20:06

kylecarbs merged commit 42c880f into main Jun 13, 2026
5 checks passed

kylecarbs deleted the perf/ui-viewport-row-cache branch June 13, 2026 20:23

kylecarbs mentioned this pull request Jun 13, 2026

chore: release v0.5.18 #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reuse unchanged viewport rows in boo ui#58

perf: reuse unchanged viewport rows in boo ui#58
kylecarbs merged 2 commits into
mainfrom
perf/ui-viewport-row-cache

kylecarbs commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kylecarbs commented Jun 13, 2026

What

Changes

Why identity, not just the dirty bit

Benchmark

Testing

Out of scope

Root cause (src/ui.zig composeFrame)

Approach (two independent wins)

Why it is correct

Risks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Root cause (`src/ui.zig` `composeFrame`)