|
| 1 | +# Humanize All Computer Interaction Endpoints |
| 2 | + |
| 3 | +> Add human-like behavior to all computer interaction API endpoints using fast, pre-computed algorithms that add zero additional xdotool process spawns. |
| 4 | +
|
| 5 | +## Performance-First Design Principle |
| 6 | + |
| 7 | +**The bottleneck is xdotool process spawns** (fork+exec per call), not Go-side computation. Every algorithm below is designed around two rules: |
| 8 | + |
| 9 | +1. **One xdotool call per API request** -- pre-compute all timing in Go and bake it into a single chained xdotool command with inline `sleep` directives. This is the same pattern already used by `doDragMouse` (see lines 911-951 of `computer.go`). |
| 10 | +2. **O(1) or O(n) math only** -- uniform random (`rand.Intn`), simple easing polynomials (2-3 multiplies), no lookup tables, no transcendental functions beyond what `mousetrajectory.go` already uses. |
| 11 | + |
| 12 | +```mermaid |
| 13 | +flowchart LR |
| 14 | + Go["Go: pre-compute timing array O(n)"] --> Args["Build xdotool arg slice"] |
| 15 | + Args --> OneExec["Single fork+exec"] |
| 16 | + OneExec --> Done["Done"] |
| 17 | +``` |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | +### Existing proof this works |
| 22 | + |
| 23 | +`doDragMouse` already chains `mousemove_relative dx dy sleep 0.050 mousemove_relative dx dy sleep 0.050 ...` in a single xdotool invocation. Every strategy below follows this exact pattern. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## 0. Move Mouse -- Bezier Curve Trajectory (Already Implemented) |
| 28 | + |
| 29 | +**Status:** Complete. This is the reference implementation that all other endpoints follow. |
| 30 | + |
| 31 | +**Cost:** N xdotool calls (one `mousemove_relative` per trajectory point) with Go-side sleeps. Typically 5-80 steps depending on distance. |
| 32 | + |
| 33 | +**Algorithm:** Bezier curve with randomized control points, distortion, and easing. Ported from Camoufox/HumanCursor. |
| 34 | + |
| 35 | +- **Bezier curve**: 2 random internal knots within an 80px-padded bounding box around start/end. Bernstein polynomial evaluation produces smooth curved path. O(n) computation. |
| 36 | +- **Distortion**: 50% chance per interior point to apply Gaussian jitter (mean=1, stdev=1 via Box-Muller transform). Adds micro-imperfections. |
| 37 | +- **Easing**: `easeOutQuad(t) = -t*(t-2)` -- cursor decelerates as it approaches the target, matching natural human behavior. |
| 38 | +- **Point count**: Auto-computed from path length (`pathLength^0.25 * 20`), clamped to [5, 80]. Override via `Options.MaxPoints`. |
| 39 | +- **Per-step timing**: ~10ms default step delay with +/-2ms uniform jitter. When `duration_ms` is specified, delay is computed as `duration_ms / numSteps`. |
| 40 | +- **Screen clamping**: Trajectory points clamped to screen bounds to prevent X11 delta accumulation errors. |
| 41 | + |
| 42 | +**Key files:** |
| 43 | + |
| 44 | +- `[server/lib/mousetrajectory/mousetrajectory.go](kernel-images/server/lib/mousetrajectory/mousetrajectory.go)` -- Bezier curve generation (~230 lines) |
| 45 | +- `[server/cmd/api/api/computer.go](kernel-images/server/cmd/api/api/computer.go)` lines 104-206 -- `doMoveMouseSmooth` integration |
| 46 | + |
| 47 | +**API (existing):** `MoveMouseRequest` has `smooth: boolean` (default `true`) and optional `duration_ms` (50-5000ms). |
| 48 | + |
| 49 | +**Implementation in `doMoveMouseSmooth`:** |
| 50 | + |
| 51 | +1. Get current mouse position via `xdotool getmouselocation` |
| 52 | +2. Generate Bezier trajectory: `mousetrajectory.NewHumanizeMouseTrajectoryWithOptions(fromX, fromY, toX, toY, opts)` |
| 53 | +3. Clamp points to screen bounds |
| 54 | +4. For each point: `xdotool mousemove_relative -- dx dy`, then `sleepWithContext` with jittered delay |
| 55 | +5. Modifier keys held via `keydown`/`keyup` wrapper |
| 56 | + |
| 57 | +**Note:** This endpoint uses per-step Go-side sleeps (not xdotool inline `sleep`) because the trajectory includes screen-clamping logic that adjusts deltas at runtime. The other endpoints below use inline `sleep` since their timing can be fully pre-computed. |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## Shared Library: `server/lib/humanize/humanize.go` |
| 62 | + |
| 63 | +Tiny utility package (no external deps, no data structures) providing: |
| 64 | + |
| 65 | +```go |
| 66 | +// UniformJitter returns a random duration in [base-jitter, base+jitter], clamped to min. |
| 67 | +func UniformJitter(rng *rand.Rand, baseMs, jitterMs, minMs int) time.Duration |
| 68 | + |
| 69 | +// EaseOutQuad computes t*(2-t) for t in [0,1]. Two multiplies. |
| 70 | +func EaseOutQuad(t float64) float64 |
| 71 | + |
| 72 | +// SmoothStepDelay maps position i/n through a smoothstep curve to produce |
| 73 | +// a delay in [fastMs, slowMs]. Used for scroll and drag easing. |
| 74 | +// smoothstep(t) = 3t^2 - 2t^3. Three multiplies. |
| 75 | +func SmoothStepDelay(i, n, slowMs, fastMs int) time.Duration |
| 76 | + |
| 77 | +// FormatSleepArg formats a duration as a string suitable for xdotool's |
| 78 | +// inline sleep command (e.g. "0.085"). Avoids fmt.Sprintf per call. |
| 79 | +func FormatSleepArg(d time.Duration) string |
| 80 | +``` |
| 81 | + |
| 82 | +All functions are pure, allocate nothing, and cost a few arithmetic ops each. Tested with table-driven tests and deterministic seeds. |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## 1. Click Mouse -- Single-Call Down/Sleep/Up |
| 87 | + |
| 88 | +**Cost:** 1 xdotool call (same as current). Pre-computation: 1-2 `rand.Intn` calls. |
| 89 | + |
| 90 | +**Algorithm:** Replace `click` with `mousedown <btn> sleep <dwell> mouseup <btn>` in the same xdotool arg slice. No separate process spawns. |
| 91 | + |
| 92 | +- **Dwell time**: `UniformJitter(rng, 90, 30, 50)` -> range [60, 120]ms. This matches measured human click dwell without needing lognormal sampling. |
| 93 | +- **Micro-drift**: Append `mousemove_relative <dx> <dy>` between mousedown and mouseup, where dx/dy are `rand.Intn(3)-1` (range [-1, 1] pixels). Trivially cheap. |
| 94 | +- **Multi-click**: For `num_clicks > 1`, loop and insert inter-click gaps via `UniformJitter(rng, 100, 30, 60)` -> [70, 130]ms. |
| 95 | + |
| 96 | +**Single xdotool call example:** |
| 97 | + |
| 98 | +``` |
| 99 | +xdotool mousemove 500 300 mousedown 1 sleep 0.085 mousemove_relative -- 1 0 mouseup 1 |
| 100 | +``` |
| 101 | + |
| 102 | +**API change:** Add `smooth: boolean` (default `true`) to `ClickMouseRequest`. |
| 103 | + |
| 104 | +--- |
| 105 | + |
| 106 | +## 2. Type Text -- Chunked Type with Inter-Word Pauses |
| 107 | + |
| 108 | +**Cost:** 1 xdotool call (same as current). Pre-computation: O(words) random samples. |
| 109 | + |
| 110 | +**Algorithm:** Instead of per-character keysym mapping (which is complex and fragile for Unicode), split text by whitespace/punctuation into chunks and chain `xdotool type --delay <intra> "chunk" sleep <inter>` commands. |
| 111 | + |
| 112 | +- **Intra-word delay**: Per-chunk, pick `rand.Intn(70) + 50` -> [50, 120]ms. Varies per chunk to simulate burst-pause rhythm. |
| 113 | +- **Inter-word pause**: Between chunks, insert `sleep` with `UniformJitter(rng, 140, 60, 60)` -> [80, 200]ms. Longer pauses at sentence boundaries (after `.!?`): multiply by 1.5x. |
| 114 | +- **No bigram tables**: The per-word delay variation is sufficient for convincing humanization. Bigram-level precision adds complexity with diminishing returns for bot detection evasion. |
| 115 | + |
| 116 | +**Single xdotool call example:** |
| 117 | + |
| 118 | +``` |
| 119 | +xdotool type --delay 80 -- "Hello" sleep 0.150 type --delay 65 -- " world" sleep 0.300 type --delay 95 -- ". How" sleep 0.120 type --delay 70 -- " are" sleep 0.140 type --delay 85 -- " you?" |
| 120 | +``` |
| 121 | + |
| 122 | +**API change:** Add `smooth: boolean` (default `false`) to `TypeTextRequest`. When `smooth=true`, the existing `delay` field is ignored. |
| 123 | + |
| 124 | +**Why this is fast:** We never leave the `xdotool type` mechanism (which handles Unicode, XKB keymaps, etc. internally). We just break it into chunks with sleeps between them. One fork+exec total. |
| 125 | + |
| 126 | +--- |
| 127 | + |
| 128 | +## 3. Press Key -- Dwell via Inline Sleep |
| 129 | + |
| 130 | +**Cost:** 1 xdotool call (same as current). Pre-computation: 1 `rand.Intn` call. |
| 131 | + |
| 132 | +**Algorithm:** Replace `key <keysym>` with `keydown <keysym> sleep <dwell> keyup <keysym>`. |
| 133 | + |
| 134 | +- **Tap dwell**: `UniformJitter(rng, 95, 30, 50)` -> [65, 125]ms. |
| 135 | +- **Modifier stagger**: When `hold_keys` are present, insert a small `sleep 0.025` between each `keydown` for modifiers, then the primary key sequence. Release in reverse order with the same stagger. This costs zero extra xdotool calls -- it's all in the same arg slice. |
| 136 | + |
| 137 | +**Single xdotool call example (Ctrl+C):** |
| 138 | + |
| 139 | +``` |
| 140 | +xdotool keydown ctrl sleep 0.030 keydown c sleep 0.095 keyup c sleep 0.025 keyup ctrl |
| 141 | +``` |
| 142 | + |
| 143 | +**API change:** Add `smooth: boolean` (default `false`) to `PressKeyRequest`. |
| 144 | + |
| 145 | +--- |
| 146 | + |
| 147 | +## 4. Scroll -- Eased Tick Intervals in One Call |
| 148 | + |
| 149 | +**Cost:** 1 xdotool call (same as current). Pre-computation: O(ticks) easing function evaluations (3 multiplies each). |
| 150 | + |
| 151 | +**Algorithm:** Replace `click --repeat N --delay 0 <btn>` with N individual `click <btn>` commands separated by pre-computed `sleep` values following a **smoothstep easing curve**. |
| 152 | + |
| 153 | +- **Easing**: `SmoothStepDelay(i, N, slowMs=80, fastMs=15)` for each tick i. The smoothstep `3t^2 - 2t^3` creates natural momentum: slow start, fast middle, slow end. |
| 154 | +- **Jitter**: Add `rand.Intn(10) - 5` ms to each delay. Trivially cheap. |
| 155 | +- **Small scrolls (1-3 ticks)**: Skip easing, use uniform delay of `rand.Intn(40) + 30` ms. |
| 156 | + |
| 157 | +**Single xdotool call example (5 ticks down):** |
| 158 | + |
| 159 | +``` |
| 160 | +xdotool mousemove 500 300 click 5 sleep 0.075 click 5 sleep 0.035 click 5 sleep 0.018 click 5 sleep 0.040 click 5 |
| 161 | +``` |
| 162 | + |
| 163 | +**API change:** Add `smooth: boolean` (default `false`) to `ScrollRequest`. |
| 164 | + |
| 165 | +**Why not per-tick Go-side sleeps?** That would require N separate xdotool calls (N fork+execs). Inline `sleep` achieves the same timing in one process. |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +## 5. Drag Mouse -- Bezier Path + Eased Delays |
| 170 | + |
| 171 | +**Cost:** Same as current (1-3 xdotool calls for the 3 phases). Pre-computation: Bezier generation (already proven fast in `mousetrajectory.go`). |
| 172 | + |
| 173 | +**Algorithm:** When `smooth=true`, auto-generate the drag path using the existing `mousetrajectory.HumanizeMouseTrajectory` Bezier library, then apply eased step delays (instead of the current fixed `step_delay_ms`). |
| 174 | + |
| 175 | +- **Path generation**: `mousetrajectory.NewHumanizeMouseTrajectoryWithOptions(startX, startY, endX, endY, opts)` -- already O(n) with Bernstein polynomial evaluation. Proven fast. |
| 176 | +- **Eased step delays**: Replace the fixed `stepDelaySeconds` in the Phase 2 xdotool chain with per-step delays from `SmoothStepDelay`. Slow at start (pickup) and end (placement), fast in middle. These are already baked into the single xdotool arg slice, so zero extra process spawns. |
| 177 | +- **Jitter**: Same `rand.Intn(5) - 2` ms pattern already used by `doMoveMouseSmooth`. |
| 178 | + |
| 179 | +**API change:** Add `smooth: boolean` (default `false`) to `DragMouseRequest`. When `smooth=true` and `path` has exactly 2 points (start + end), the server generates a Bezier curve between them and replaces `path` with the generated waypoints. |
| 180 | + |
| 181 | +**No new `start`/`end` fields needed** -- the caller simply provides `path: [[startX, startY], [endX, endY]]` and the server expands it. |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## Computational Cost Summary |
| 186 | + |
| 187 | + |
| 188 | +| Endpoint | xdotool calls | Pre-computation | Algorithm | |
| 189 | +| ------------- | ---------------- | --------------------------------- | ------------------------------------ | |
| 190 | +| `move_mouse` | O(points) (done) | O(points) Bezier + Box-Muller | Bezier curve + easeOutQuad + jitter | |
| 191 | +| `click_mouse` | 1 (same) | 1-2x `rand.Intn` | Uniform random dwell | |
| 192 | +| `type_text` | 1 (same) | O(words) `rand.Intn` | Chunked type + inter-word sleep | |
| 193 | +| `press_key` | 1 (same) | 1x `rand.Intn` | Inline keydown/sleep/keyup | |
| 194 | +| `scroll` | 1 (same) | O(ticks) smoothstep (3 muls each) | Eased inter-tick sleep | |
| 195 | +| `drag_mouse` | 1-3 (same) | O(points) Bezier (existing) | Bezier path + smoothstep step delays | |
| 196 | + |
| 197 | + |
| 198 | +No additional process spawns. No heap allocations beyond the existing xdotool arg slice. No lookup tables. Every random sample is a single `rand.Intn` or `rand.Float64` call. |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## Files to Create/Modify |
| 203 | + |
| 204 | +- **Modify:** `[server/openapi.yaml](kernel-images/server/openapi.yaml)` -- Add `smooth` boolean to 5 request schemas |
| 205 | +- **Modify:** `[server/cmd/api/api/computer.go](kernel-images/server/cmd/api/api/computer.go)` -- Add humanized code paths (branching on `smooth` flag) |
| 206 | +- **Create:** `server/lib/humanize/humanize.go` -- Shared primitives (~50 lines) |
| 207 | +- **Create:** `server/lib/humanize/humanize_test.go` -- Table-driven tests |
| 208 | +- **Regenerate:** OpenAPI-generated types (run code generation after schema changes) |
| 209 | + |
| 210 | +No separate per-endpoint library packages needed. The shared `humanize` package plus the existing `mousetrajectory` package cover everything. |
0 commit comments