Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,10 @@ args = [
"native/macos-host/Sources",
"scripts/smoke/lib/live-hud-mouse-path.swift",
"scripts/smoke/lib/mask-probe-capture.swift",
"scripts/smoke/lib/pasteboard-image-info.swift",
"scripts/smoke/lib/scroll-background-window.swift",
"scripts/smoke/lib/scroll-background-command.swift",
"scripts/smoke/lib/scroll-wheel-burst.swift",
"scripts/smoke/lib/visual-background-window.swift",
]

Expand All @@ -167,6 +171,10 @@ args = [
"native/macos-host/Sources",
"scripts/smoke/lib/live-hud-mouse-path.swift",
"scripts/smoke/lib/mask-probe-capture.swift",
"scripts/smoke/lib/pasteboard-image-info.swift",
"scripts/smoke/lib/scroll-background-window.swift",
"scripts/smoke/lib/scroll-background-command.swift",
"scripts/smoke/lib/scroll-wheel-burst.swift",
"scripts/smoke/lib/visual-background-window.swift",
]

Expand Down
27 changes: 19 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ https://github.com/user-attachments/assets/ff2fe84f-f551-40e8-919c-66ae8a61f8e7
- In Frozen mode, `Space` copies the current frozen PNG to the clipboard and exits.
- In Frozen mode, Cmd+S (macOS) / Ctrl+S saves the current PNG to disk and exits.
- On macOS, Frozen mode can recognize text from the current capture and copy the result to the clipboard from the toolbar.
- Frozen toolbar tools include pointer, pen, arrow, text, mosaic, spotlight, undo, redo, auto-center,
OCR, copy, and save.
- Frozen toolbar tools include pointer, pen, arrow, text, mosaic, spotlight, undo, redo,
auto-center, OCR, Scroll Capture for dragged-region freezes, copy, and save.
- `Esc` cancels capture.
- Glass HUD with Classic Glass fallback and Liquid Glass in release builds on supported macOS.
- Tab-triggered loupe sample and frozen-mode toolbar for quick action access.
Expand Down Expand Up @@ -74,7 +74,10 @@ Prototype / in active development.
- Menubar and Dock are not included in live window-outline targeting.
- Windows support is planned (minimum Windows 10), but not implemented yet.
- The scroll-capture engine, deterministic replay, and benchmark surfaces remain in the repository,
but the v0.2.1 native-host release does not expose scroll capture in the toolbar.
but the v0.2.1 native-host release does not expose scroll capture in the toolbar. On this
development branch, scroll capture uses ordered ScreenCaptureKit region frames, overlay-local
wheel forwarding, and Rust-owned fail-closed stitching on macOS. Release readiness for broader
target apps is governed by `docs/runbook/scroll-capture-recovery-plan.md`.

## Usage

Expand Down Expand Up @@ -122,13 +125,15 @@ After Gatekeeper allows the app to open, continue with Screen Recording permissi

### macOS permissions

Rsnap currently relies on **Screen Recording** permission to capture other apps/windows.
Rsnap requires **Screen Recording** permission to capture other apps/windows.
- ScreenCaptureKit live sampling on macOS requires macOS 12.3+ and Screen Recording permission.
- Normal region/window/monitor capture does not require Accessibility or Input Monitoring.
- The retained scroll-capture path uses Screen Recording-backed screenshots plus forwarded wheel
input, but the v0.2.1 native-host release does not expose scroll capture in the toolbar.
- The retained scroll-capture path uses Screen Recording-backed screenshots plus overlay-local
wheel forwarding; it does not require Accessibility, Input Monitoring, Accessibility target
acquisition, app scripting, or browser/DOM access. The v0.2.1 native-host release does not expose
scroll capture in the toolbar.
- macOS may describe Screen Recording as `Screen & System Audio Recording` or as direct screen/audio access when Rsnap bypasses the system picker.
- Settings -> Permissions shows Screen Recording as the only required permission.
- Settings -> Permissions shows Screen Recording as the required capture permission.
- Normal native capture depends on Screen Recording; if access is missing, Rsnap opens the Screen Recording page in System Settings and shows a floating drag-to-grant guide.
- You can reopen the Permissions section from `Settings…` in the tray or menubar menu at any time.
- Base capture path: `System Settings` -> `Privacy & Security` -> `Screen Recording`.
Expand Down Expand Up @@ -163,7 +168,13 @@ Rsnap currently relies on **Screen Recording** permission to capture other apps/

Scroll capture is temporarily hidden in the v0.2.1 native-host release. The retained Rust
scroll-capture session, deterministic replay, and benchmark surfaces remain for validation and
future re-enablement, but users should not expect a `Scroll Capture` toolbar item in this release.
future re-enablement, but users should not expect a `Scroll Capture` toolbar item in that release.

On this development branch, scroll capture targets dragged-region Frozen capture on macOS. The
implementation commits downward growth only after ordered-frame pairwise registration plus overlap
proof, fails closed on weak registration or rewind, and forwards wheel input to target apps through
one universal path. Follow `docs/runbook/scroll-capture-recovery-plan.md` for release-scope
validation.

## Development

Expand Down
2 changes: 2 additions & 0 deletions docs/decisions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,5 @@ Then keep the body decision-oriented:
screenshot annotation over faithful pointer-path reproduction
- `docs/decisions/frozen-toolbar-anchor.md` for the stable-anchor layout choice that prevents
style-capsule expansion from moving the primary Frozen toolbar
- `docs/decisions/scroll-capture-architecture.md` for the accepted layered scroll-capture target
architecture based on CleanShot/Xnip/Snagit/Shottr/ScrollSnap prior art and Rsnap live failures
140 changes: 140 additions & 0 deletions docs/decisions/scroll-capture-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Scroll Capture Architecture

Status: accepted and implemented for the current macOS validation path

Date: 2026-05-10

Context: Earlier scroll-capture attempts passed deterministic tests but failed live use: tearing,
sparse appends after the page had already reached the bottom, false joins after rewind, and a first
frozen toolbar frame dominated by tint. The product target is closer to CleanShot/Xnip behavior:
start from any dragged region over a scrollable app, let scrolling feel native, append only proven
content, and fail closed instead of creating a bad stitched image.

Decision: Rsnap uses one generic product path for macOS Scroll Capture: the focused overlay receives
wheel input inside the selected viewport, forwards that input to the underlying target with the
original wheel magnitude through short all-overlay passthrough windows, samples ordered
ScreenCaptureKit region frames only around input bursts, falls back to a below-overlay region capture
when the live stream does not provide a fresh region, and Rust owns monotonic registration/commit.
Accessibility is not part of the product path; Rsnap does not use Accessibility target acquisition,
settable AX scroll bars, app scripting, browser/DOM access, or a cancellable CGEvent tap.

Consequences: Scroll Capture is no longer limited to apps exposing settable AX scroll bars. It needs
Screen Recording because frames come from ScreenCaptureKit. Rsnap observes wheel input only as an
input, forwarding, and sampling signal. Wheel deltas are not treated as content movement authority,
because trackpad/mouse deltas do not reliably map to viewport pixels.

Supporting research run: `docs/research/scroll-capture-prior-art-2026-05-10.json`.

## Prior-Art Findings

| Source | What matters | Rsnap decision |
| --- | --- | --- |
| ScrollSnap | Open-source Swift uses ScreenCaptureKit region screenshots, overlay mouse passthrough, a repeating capture timer, and Vision translational registration. | Copy the macOS shape: region capture, temporary overlay passthrough, and Vision as an offset proposal. Do not copy its correctness model: it advances the previous frame after failed registration and crops committed output on upward motion. |
| wayscrollshot | Captures the selected region continuously while the user scrolls, skips duplicate signatures, and appends only the new bottom slice after overlap proof. | Copy the universal loop: user scrolls naturally; capture runs continuously; stitching is fail-closed and append-only. |
| ShareX | Uses repeated rectangle captures, configurable scroll methods, duplicate-image stopping, best-match overlap, and bottom-edge ignore for sticky chrome. | Copy the idea that scroll capture is a loop with explicit stop/failure status and overlap search that ignores unstable edges. Do not make a platform-specific message/scrollbar path the only product path. |
| Xnip public guide | Documents same-portion matching and pausing when matching fails; warns about fast, dynamic, nonvertical, and upward scrolls. | Bad input must pause/no-op, not guess. Downward growth is the only committed direction. |
| CleanShot URL API | Exposes scrolling capture as a first-class mode, including auto-scroll parameters. | Product entry should be direct and obvious, but proprietary internals are not evidence for a required AX design. |

## Architecture

### Entry

Scroll Capture starts only from an editable dragged-region frozen capture. The toolbar button and
plain `s` both call the same native entry point. On start:

- freeze the selected region and create the Rust scroll stitch session from that exact first frame;
- switch the frozen toolbar to non-editing scroll-capture state;
- forward overlay-local wheel events through short all-overlay passthrough windows;
- keep a global scroll-wheel observer only as diagnostics/fallback telemetry;
- start the ordered ScreenCaptureKit region-frame sampling loop.

### Sampling

The capture loop drains ordered live region frames by sequence number. It does not debounce to a
single latest frame and does not sample from a stale cache as commit authority. When the live stream
has no ordered frame for the selected region, Swift captures the same region below the overlay and
still sends it through the Rust overlap gate instead of appending blindly. Sampling is bounded to
short windows after scroll input instead of running forever on the main actor, so toolbar clicks and
cancel remain responsive while intermediate repaint states are still observed and rejected or
committed in order.

Wheel events are not movement authority. Overlay-local wheel handling treats each event as an input
signal, forwards the real wheel magnitude to the target while all overlay windows temporarily ignore
mouse events, and samples the resulting ScreenCaptureKit frames. Reverse/upward input may move the
underlying viewport, but it cannot mutate or crop the committed canvas. The marker on synthetic events
prevents feedback loops.

### Registration And Commit

Rust owns all commit decisions:

- Vision pairwise registration can propose downward motion between adjacent ordered frames.
- Pixel overlap corroboration must confirm the proposal before append.
- Growth is monotonic and downward-only.
- A blocked overshot frame does not become the committed frontier.
- Upward motion records rewind/observation state but never crops or mutates committed output.
- After rewind, growth resumes only after the previous committed frontier is reacquired and the
viewport advances beyond it.
- Ambiguous overlap, low-information content, sticky/changing bands, large skipped gaps, and dynamic
repaint states become no-commit states.

Copy/save always exports the same committed canvas shown in the minimap preview.

### Permissions

Required:

- Screen Recording, because ScreenCaptureKit supplies the capture frames.

Not required for Scroll Capture:

- Accessibility or Input Monitoring;
- AX target acquisition or settable scroll bars;
- target app scripting or browser/DOM access.

## Rejected Options

| Option | Why rejected |
| --- | --- |
| Latest-frame passive sampling | It can drop intermediate frames and append only a tiny tail after the page already reached the bottom. |
| AX-controlled product path | It only works for targets exposing controllable accessibility scroll bars, which violates the goal of one generic mode across arbitrary apps. |
| Permanent overlay passthrough | It makes the toolbar unreachable and loses control of the capture UI. Rsnap instead uses short all-overlay passthrough windows only while forwarding one wheel event. |
| ScrollSnap-style timer as commit clock | A timer is useful as a sampling mechanism, but correctness cannot advance the comparison anchor after failed registration. |
| Wheel delta as motion authority | Wheel/trackpad delta is input intent, not content motion. It can differ by device, target app, acceleration, rubber-banding, and scroll position. |
| Browser/DOM full-page capture | Useful as a future specialized browser path, but it does not cover arbitrary macOS apps/windows. |

## Implementation Status

- The native host starts Scroll Capture in `manual_universal` mode and emits
`capture.scroll_capture_mode outcome=manual_universal`.
- The native host emits `capture.scroll_input_tap outcome=not_used` and uses overlay-local wheel
forwarding as the release-quality input path.
- Synthetic wheel forwarding preserves the real wheel magnitude for target app feel; wheel magnitude
is never used as pixel motion authority, and reverse/upward viewport motion is observed without
append.
- Rust's worker-pairwise path handles ordered frames, overlap corroboration, overshot blocking, and
rewind/reacquire behavior.
- Settings exposes Screen Recording as the only required permission for Scroll Capture.
- The first frozen toolbar frame is covered by the native visual contract tint check.
- Active scroll-capture toolbar glass keeps the configured Settings tint/material but leaves the
toolbar backing in live HUD-style transparency so Liquid Glass samples the real live content
instead of a frozen/tinted surrogate.

## Validation Contract

Before calling Scroll Capture fixed:

- `scripts/smoke/native-scroll-capture-macos.sh` must pass with `SCROLL_DRIVER=wheel` for keyboard
start and toolbar start.
- `scripts/smoke/native-visual-contract-macos.sh` must pass and keep toolbar tint dominance below
its threshold.
- Rust scroll-capture tests must pass, including overshot and rewind/reacquire cases.
- `cargo make checks` must pass before handoff.

## Sources

- ScrollSnap: https://github.com/Brkgng/ScrollSnap
- wayscrollshot: https://github.com/jswysnemc/wayscrollshot
- ShareX scrolling capture: https://github.com/ShareX/ShareX
- Xnip scrolling capture guide: https://www.xnipapp.com/scrolling-capture/
- CleanShot URL API: https://cleanshot.com/docs-api
6 changes: 6 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ The active split below is by question type, not by human-versus-agent audience.
`docs/runbook/`
- Need native-host or Rust telemetry collection and summary steps ->
`docs/runbook/telemetry-debugging.md`
- Need to recover macOS scroll capture after live tearing, sparse stitching, or rollback failures
-> `docs/runbook/scroll-capture-recovery-plan.md`
- Need the step-by-step execution sequence for a host/core reset slice ->
`docs/runbook/architecture-reset-implementation.md`
- Need the active architecture-reset target and migration posture ->
Expand All @@ -47,6 +49,10 @@ The active split below is by question type, not by human-versus-agent audience.
`docs/reference/workspace-layout.md`
- Need durable rationale for the architecture reset ->
`docs/decisions/native-host-rust-core-reset.md`
- Need the accepted layered scroll-capture architecture and prior-art analysis ->
`docs/decisions/scroll-capture-architecture.md`
- Need the supporting machine-readable research run for scroll-capture prior-art analysis ->
`docs/research/scroll-capture-prior-art-2026-05-10.json`
- Need generic repo gate names -> `Makefile.toml`
- Need smoke or perf validation entrypoints -> `scripts/smoke/` and `scripts/perf/`
- Need documentation placement or authoring rules -> `docs/policy.md`
Expand Down
8 changes: 6 additions & 2 deletions docs/policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Purpose: Define how agent-facing documentation is organized, updated, and kept c
across this repository.

Audience: All documentation under `docs/` is written for AI agents and LLM workflows.
The active split between `spec`, `runbook`, `reference`, and `decisions` is by task shape, not by
reader type.
The active split between `spec`, `runbook`, `reference`, `decisions`, and supporting `research`
artifacts is by task shape, not by reader type.

## Principles

Expand All @@ -25,6 +25,7 @@ reader type.
| Runbook | `docs/runbook/` | Which sequence should I execute? | Runbooks, migrations, validation, troubleshooting | Any procedure or operational change |
| Reference | `docs/reference/` | How is it currently organized or implemented? | Ownership maps, implementation-model notes, non-normative technical context | Any layout, ownership, or current-implementation explanation change |
| Decisions | `docs/decisions/` | Why was this tradeoff accepted? | Durable rationale for accepted technical or product choices | Any accepted decision with long-lived consequences |
| Research | `docs/research/` | What evidence supported a bounded investigation? | Machine-readable supporting research runs, not primary behavior authority | Any evidence-backed research run that must remain replayable |

## Placement rules

Expand All @@ -34,6 +35,9 @@ reader type.
defining correctness, it belongs in `docs/reference/`.
- If a document records why a durable tradeoff was accepted, which alternatives were considered,
and what consequences follow from that choice, it belongs in `docs/decisions/`.
- If a document records a bounded research method, evidence inventory, challenge, or decision
finalization, it belongs in `docs/research/` and must link to the authoritative spec, runbook,
reference, or decision it supports.
- If a document becomes historical-only and no longer helps execute current work, remove it from
`docs/` instead of keeping it in the active routing surface.
- Do not duplicate the same authoritative content across documents. Link to the source
Expand Down
Loading