Commit 5c3cae4
authored
brotli-compress .socket.facts.json on upload (#1291)
* refactor: brotli-compress .socket.facts.json on upload
Compress `.socket.facts.json` with brotli at upload time, just before
`fetchCreateOrgFullScan` POSTs the multipart `/full-scans` body. Coana
keeps writing plain JSON; the local readers
(`extractTier1ReachabilityScanId`, `extractReachabilityErrors`) keep
reading plain JSON; only the on-the-wire bytes between socket-cli and
depscan change. depscan transparently decompresses at the api-v0
multipart ingest boundary in a coordinated change.
Why:
* Mono-repo `.socket.facts.json` files routinely exceed 60MB. On a
representative simple-npm fixture, brotli compresses 150,748 bytes
to 19,971 (~86.8% reduction); production mono-repos see a much
larger absolute saving.
* Coana has a second consumer downstream - teaching coana to write
`.br` would break it. Brotli on the wire-only path keeps coana's
contract invariant.
* `tier1ReachabilityScanId` and reachability-error reporting still
read plain JSON locally; no brotli round-trip on those paths.
* Compression is a transport detail owned by the upload site; cleanup
is one `finally` block.
Implementation:
* `src/utils/coana.mts` - new exported
`compressSocketFactsForUpload(scanPaths)` returning `{ paths, cleanup }`.
Per `.socket.facts.json` entry, `mkdtempSync` a fresh `.socket-br-XXX/`
directory inside the source's parent dir (NOT under `os.tmpdir()` -
see below), `brotliCompressSync` bytes to
`${td}/.socket.facts.json.br`, swap the path. Other paths pass through.
Missing facts paths pass through. Cleanup is idempotent with
`force: true`.
* `src/commands/scan/handle-create-new-scan.mts` - wraps the
`fetchCreateOrgFullScan` call in
`try { compress; upload; } finally { cleanup(); }`. Cleanup runs on
success, throw, or any abort path.
Why the temp lives next to the source:
The SDK computes `path.relative(cwd, brPath)` for the multipart name
field. depscan's multipart ingest (`addStreamEntry`) checks
`containsTraversal(unixified)` and bails on any `..` segment. A
tmpdir-based path resolves to `../../../var/folders/...`, gets dropped
to `unmatchedFiles`, and the SocketFacts content never lands in the
scan. Putting the temp inside `path.dirname(originalFactsPath)`
produces a relative path like `.socket-br-XXX/.socket.facts.json.br` -
traversal-free, ingests cleanly.
Tests:
* `src/utils/coana.test.mts` - 16 cases.
- `compressSocketFactsForUpload` x 5: round-trip JSON via
`brotliDecompressSync`, basename swap to `.socket.facts.json.br`,
non-facts paths pass through, missing facts paths pass through,
cleanup idempotent, mixed-entry order. Pins the contract that the
temp lives in a subdir of the source's parent (traversal-free).
- `extractTier1ReachabilityScanId` x 7: plain JSON, missing file,
missing field, null, empty/blank, trim, numeric coercion.
- `extractReachabilityErrors` x 4: extraction, missing file,
no-components, no-inner-reachability.
This change requires the matching depscan multipart decompression
patch on the receiving side; that change ships first.
* fix: write brotli .br as sibling of source, not in temp subdir
Previous form wrapped each `.br` in a per-scan `mkdtempSync` subdir
under the source dir for concurrency isolation. That created a
directory-handling asymmetry on depscan's side: a wire path of
`dirA/.socket.facts.json.br` flattened to `.socket.facts.json` at
root via depscan's basename-strip, while plain `dirA/.socket.facts.json`
preserved the dir.
depscan dropped the basename-strip in the corresponding PR; switch
to writing `.br` as a sibling of the source so wire and storage
paths match for both branches. Net effect: a brotli upload from
`<source>/.socket.facts.json` lands at `<source>/.socket.facts.json`
in the manifest tarball - identical to plain.
Concurrent-scan note: coana already writes to a single source path,
so the source `.socket.facts.json` itself is racy under concurrent
runs against the same dir; the sibling `.br` doesn't introduce a
new race that wasn't already there.
* `src/utils/coana.mts`: `compressSocketFactsForUpload` writes
`${p}.br` next to the source; `cleanup()` does
`rm(brPath, { force: true })` per file. Drops `mkdtemp` import
that's no longer used.
* `src/utils/coana.test.mts`: directory-shape assertion replaced
with `swappedPath === ${input}.br` (sibling). First test now
also asserts the source survives `cleanup()`. 16 cases.
* fix: clean up partial brotli siblings on compress failure
Per Cursor Bugbot finding on src/utils/coana.mts: if any brotli
pipeline in the `Promise.all` batch rejects, the helper rejected
before returning its `cleanup` callback, so already-completed
sibling `.br` files were orphaned next to their source.
Three changes that together make the helper failure-safe:
* Track the intended `.br` path BEFORE the pipeline starts. Previous
code pushed to `brPaths` after `await pipeline(...)`, so a sibling
that the pipeline started writing but didn't finish was invisible
to cleanup. `rm({ force: true })` no-ops on missing paths, so
tracking the intent is safe.
* Switch the parallel batch from `Promise.all` to `Promise.allSettled`.
With `all`, the first rejection bubbled out while sibling pipelines
were still writing bytes; calling `cleanup()` then would race the
live writes and could `rm` a sibling only to have it re-created
immediately. `allSettled` waits for every pipeline to settle, so
cleanup sees a stable set of files to remove.
* On batch failure, run `cleanup()` internally before re-throwing so
the caller (which only gets a chance to call cleanup when the
helper resolves) doesn't have to. Add `recursive: true` to the rm
call so a defensively-occupied directory at a `.br` path doesn't
halt cleanup partway through the list.
Test added: `removes partial .br siblings if compression fails
mid-batch` — pre-occupies entry B's `.br` destination with a
directory so its `createWriteStream` rejects with EISDIR, then
asserts that entry A's `.br` is not orphaned on disk after the
batch rejects and both source files survive untouched.
* chore(release): 1.1.98
Bump version to 1.1.98 and add changelog entry for the brotli
upload-compression refactor in this PR.1 parent 4c49d6e commit 5c3cae4
5 files changed
Lines changed: 446 additions & 23 deletions
File tree
- src
- commands/scan
- utils
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
13 | 18 | | |
14 | 19 | | |
15 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
279 | 280 | | |
280 | 281 | | |
281 | 282 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
304 | 317 | | |
305 | 318 | | |
306 | 319 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
6 | 11 | | |
7 | 12 | | |
8 | 13 | | |
| |||
11 | 16 | | |
12 | 17 | | |
13 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
14 | 25 | | |
15 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
16 | 118 | | |
17 | 119 | | |
18 | 120 | | |
| |||
0 commit comments