perf: reuse unchanged viewport rows in boo ui#58
Merged
Conversation
In `boo ui` every ~15ms frame re-serialized every viewport row through libghostty's ScreenFormatter, allocating a fresh writer per row, then diffed the result against the row cache and discarded unchanged rows. The per-frame cost was O(rows) VT serializations plus O(rows) heap allocations even when a single cell changed, so the UI did far more work than a plain attach and felt slower than raw SSH. Two changes: - `appendTermRow` now formats straight into the caller's buffer via `Allocating.fromArrayList`, reusing capacity instead of allocating a writer per row. - A per-row viewport cache stores each row's serialized bytes and reuses them when libghostty reports the row unchanged. A row is reused only when its row identity (page node and offset) and its dirty bit are both unchanged, then the dirty bits are cleared once per frame. The identity check is required, not just the dirty bit: scrolling the active screen relocates a visual row onto a different libghostty row while the row itself stays clean, so a dirty-only cache would reuse stale bytes. A unit test pins this contract and an integration test covers heavy scrolling end to end. Microbenchmark (bench/render.zig, 50x200, c_allocator): status quo, all rows: 80267 ns/frame, 150 allocs/frame reused buffer, all rows: 76639 ns/frame, 0 allocs/frame 1 changed row, rest cached: 1760 ns/frame Localized updates (typing echo, progress bars, status lines) drop ~46x per frame; full repaints keep the same serialization cost minus the per-row allocations.
waitFor("DONE-MARK") could match the echoed command line, which contains
the literal marker, before any loop output rendered. On macOS the wait
returned immediately and the screen held no LINE-N yet, so the monotonic
check saw nothing (expected 200, found -1). Wait on "LINE-200" instead,
which the command does not contain literally, so the wait only matches
real output.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
In
boo ui, every ~15ms frame re-serialized every viewport row through libghostty'sScreenFormatter(allocating a fresh writer per row), then diffed the result against the row cache and discarded unchanged rows. That isO(rows)VT serializations +O(rows)heap allocations per frame even when a single cell changed, so the UI does much more work than a plainattachand feels slower than raw SSH, especially for interactive/localized updates (typing echo, progress bars, status lines).Changes
appendTermRownow formats straight into the caller's buffer viaAllocating.fromArrayList, reusing capacity instead of allocating a writer per row.Why identity, not just the dirty bit
Scrolling the active screen relocates a visual row onto a different libghostty row while the row itself stays clean. Probe after a single scroll on a settled screen:
A dirty-only cache would reuse stale bytes for rows 0-1. Keying reuse on identity + dirty fixes that. Supporting invariants: the existing full-row
row_cachestill gates what is actually written; the cursor and text selection are drawn as separate overlays so they do not invalidate the base cache; and every viewport-structural change (active scroll, browse-scroll, resize, view switch,C-a l) already forces a full repaint.Benchmark
zig build bench(bench/render.zig, 50x200,c_allocator, same allocator boo uses):Localized-update frames drop ~46x (80.3us -> 1.8us); full repaints (bulk scroll, TUI redraw) keep the same serialization cost minus the 150 allocs/frame.
Testing
zig build test,zig build test-integration, andzig build test-all -Doptimize=ReleaseSafeall pass;zig fmt --checkclean.viewportRowReusable re-serializes a clean row that scrolled awaypins the identity contract and fails if the identity check is removed (verified).ui: scrolling output keeps the viewport in sync with the sessiondrives heavy scrolling through a real PTY and asserts the rendered tail is strictly monotonic (a stale reused row would break ordering).Out of scope
The ~15ms render coalescing interval and the architectural double VT parse (daemon + client) are left untouched.
Implementation plan & decision log
Root cause (
src/ui.zigcomposeFrame)Every frame, for every viewport row,
appendTermRowruns theScreenFormatterto re-serialize the row, allocating a freshAllocatingwriter, then the bytes are compared torow_cacheand discarded if unchanged.PageList.clearDirtywas never called, so dirty tracking was unused.Approach (two independent wins)
Allocating.fromArrayList.!full_render && live && identity matches && !pin.isDirty();clearDirty()once per frame.Why it is correct
Pin.isDirty()ispage.dirty or row.dirty. Clearing dirty each frame makes "clean this frame" mean "unchanged since last frame", chaining back to the last serialization.full_rendertrigger; it is caught by the pin-identity check (each visual row maps to a new(node, offset)). In-place edits setrow.dirty(Screen.zigcursor write / clears).row_cachestill gates writes.clearDirty()runs synchronously after compose, so no dirty bit is dropped.Risks
Generated by Coder Agents on behalf of @kylecarbs.