Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .changeset/linear-cache-and-cwd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
"@maastrich/hashup": minor
---

Linear-memory cache + `--cwd` CLI flag.

**Linear-memory cache.** `HashupCache.hashes` now stores each file's own
sha256 content hash (one 64-char string) instead of the flattened
transitive hash list. The transitive contribution is reconstructed at
combine time by walking `cache.deps`. Memory drops from
O(files × avg closure) to O(unique files) — on a real-world run that
previously needed 9 GB of heap, peak RSS is now ~125 MB and wall time
drops from minutes to ~1 s.

**Hash output changes.** The final digest is now
`sha256(concat of each reachable file's content hash, sorted by path)`.
Each unique file contributes exactly once regardless of how many import
paths reach it. Any stored 0.6.x hashes must be re-baselined. As a
welcome side effect: cycles now hash the same regardless of which
member was the entry point.

**`--cwd <dir>` CLI flag.** Run `hashup` as if invoked from the given
directory. Changes where `hashup.json` is discovered, where relative
entry/extras paths resolve, and where `--out` writes. Defaults to
`process.cwd()`.

```bash
hashup --cwd ./packages/app
hashup --cwd ./packages/app src/index.ts -o ../dist/app.hash
```

**Targeted break for direct `hashFile` callers.** Return type is now
`Promise<string | null>` (the file's own hash, or `null` on failure)
instead of `Promise<string[]>`. Callers should use `collectReachable`
to enumerate the transitive set and read each file's hash from
`cache.hashes` at combine time. `hashup()` itself is unchanged.
2 changes: 2 additions & 0 deletions docs/api/hashup.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ function hashup(entryFile: string, options?: HashupOptions): Promise<HashupResul

Resolves every import reachable from `entryFile` and returns a deterministic
SHA-256 hash covering the entry, its transitive dependencies, and any `extras`.
Each unique file contributes exactly once; the final digest is
`sha256(sorted concat of each reachable file's content hash)`.

## Parameters

Expand Down
49 changes: 30 additions & 19 deletions docs/api/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ are excluded.

```ts
interface HashupCache {
hashes: Map<string, string[]>;
hashes: Map<string, string>;
deps: Map<string, string[]>;
}

Expand All @@ -55,12 +55,14 @@ function collectReachable(roots: readonly string[], cache: HashupCache): string[
```

An in-memory cache scoped to one consumer's lifetime — not persisted,
not shared across processes. `hashes` stores each file's flattened
hash list; `deps` stores each file's direct resolved dependency paths.
Pass the same `HashupCache` to multiple `hashup()` or `hashFile()` calls
to dedupe work. `collectReachable` walks `deps` iteratively to rebuild
a per-call file list (used internally by `hashup()` to produce
`result.files`).
not shared across processes. `hashes` stores each file's own content
hash (one 64-char sha256 string per file); `deps` stores each file's
direct resolved dependency paths. Memory is linear in the number of
unique files. Pass the same `HashupCache` to multiple `hashup()` or
`hashFile()` calls to dedupe work. `collectReachable` walks `deps`
iteratively (no recursion) to enumerate the transitive closure — used
internally by `hashup()` to produce `result.files` and to fold each
file's content hash into the final digest.

## hashFile

Expand All @@ -70,15 +72,18 @@ function hashFile(
cache: HashupCache,
resolver: Resolver,
logger?: Logger,
): Promise<string[]>;
): Promise<string | null>;
```

Hashes a file and all its transitive static imports. Results are memoized in
`cache` — pass the same `HashupCache` across multiple calls to dedupe work.
On error (file read or parse failure) the failure is sent through
`logger.warn` and an empty array is returned. `logger` defaults to a silent
logger; build one with [`createLogger`](#createlogger) when you want
diagnostics on stderr.
Hashes a file and recursively populates `cache.hashes` and `cache.deps`
for every non-`node_modules` transitive import. Returns the file's own
content hash on success, or `null` if the file could not be read or
parsed. The transitive contribution is reconstructed at combine time by
walking `cache.deps` — `hashFile` never returns the flattened list.
Results are memoized in `cache` — pass the same `HashupCache` across
multiple calls to dedupe work. `logger` defaults to a silent logger;
build one with [`createLogger`](#createlogger) when you want diagnostics
on stderr.

Imports that resolve into `node_modules` are treated as opaque: the resolved
path is skipped, its files are never read, and the dependency's own imports
Expand Down Expand Up @@ -132,17 +137,23 @@ and hashing the result. Order-sensitive — pass hashes in a stable order.
## Composing Your Own Pipeline

```ts
import { combineHashes, createHashupCache, createResolver, hashFile } from "@maastrich/hashup";
import {
collectReachable,
combineHashes,
createHashupCache,
createResolver,
hashFile,
} from "@maastrich/hashup";

const resolver = createResolver();
const cache = createHashupCache();

const entries = ["./src/a.ts", "./src/b.ts"];
const allHashes: string[] = [];

for (const entry of entries) {
allHashes.push(...(await hashFile(entry, cache, resolver)));
await hashFile(entry, cache, resolver);
}

const combined = combineHashes(allHashes);
const files = collectReachable(entries, cache).sort();
const selfHashes = files.map((f) => cache.hashes.get(f)).filter((h) => h !== undefined);
const combined = combineHashes(selfHashes);
```
3 changes: 3 additions & 0 deletions docs/guide/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ hashup src/index.ts
Prints the hash of `src/index.ts` and its transitive import graph. Flags:

- `-e, --extra <file>` — include an additional file in the hash (repeatable)
- `--cwd <dir>` — run as if invoked from this directory. Changes where
`hashup.json` is discovered and where relative paths resolve. Defaults
to `process.cwd()`.
- `-b, --base-dir <dir>` — base directory for resolution (default: cwd)
- `--json` — emit `{ "hash": "…" }` instead of plain text
- `--files` — include the resolved file list in the JSON output
Expand Down
15 changes: 8 additions & 7 deletions docs/guide/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@
conditional exports, and extension resolution.
3. **Hash each file's content** (SHA-256). Results are cached per absolute path
so a file reachable through multiple paths is hashed once.
4. **Combine all hashes** — the entry's graph plus any `extras` — into a single
deterministic SHA-256 digest.
4. **Combine the unique file hashes**, in sorted-path order, into a single
SHA-256 digest. Every file in the transitive closure contributes exactly
once, regardless of how many import paths reach it — memory stays linear
in the number of unique files, independent of graph width or diamond count.

## Determinism

Expand Down Expand Up @@ -60,8 +62,7 @@ top-level `"logLevel"` field. The CLI flag wins when both are set.

## Caveats

- **Circular imports** terminate deterministically, but the exact hash of a
cycle depends on which member was the entry point — the cache is seeded
with the entry's content hash first, so cycle re-visits return that
placeholder. Entering the same cycle from a different file produces a
different (still deterministic) hash.
- **Circular imports** terminate deterministically. The cache is seeded with
the file's own content hash before recursing, and each unique file
contributes exactly once to the final digest, so entering the same
cycle from any of its members produces the same hash.
17 changes: 12 additions & 5 deletions src/cli/main.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import { resolve } from "node:path";
import { configJsonSchema } from "../config/json-schema.js";
import { die } from "./die.js";
import { parseCliArgs } from "./parse-args.js";
Expand All @@ -19,8 +20,14 @@ export async function main(argv: string[]): Promise<void> {
return;
}

// --cwd is resolved against the real process.cwd() so that relative
// values on the command line behave predictably. Everything else
// (config path, baseDir, output path) resolves against this effective
// cwd, letting a single `--cwd ./packages/app` move the whole run.
const cwd = args.cwd !== undefined ? resolve(process.cwd(), args.cwd) : process.cwd();

if (args.printSchema) {
await writeOutput(process.cwd(), args.out, `${JSON.stringify(configJsonSchema, null, 2)}\n`);
await writeOutput(cwd, args.out, `${JSON.stringify(configJsonSchema, null, 2)}\n`);
return;
}

Expand All @@ -30,20 +37,20 @@ export async function main(argv: string[]): Promise<void> {

if (args.positionals.length === 1) {
const output = await runSingleFileMode({
cwd: process.cwd(),
cwd,
file: args.positionals[0]!,
extras: args.extras,
baseDirOverride: args.baseDir,
json: args.json,
files: args.files,
logLevel: args.logLevel,
});
await writeOutput(process.cwd(), args.out, output);
await writeOutput(cwd, args.out, output);
return;
}

const result = await runConfigMode({
cwd: process.cwd(),
cwd,
configPath: args.config,
baseDirOverride: args.baseDir,
json: args.json,
Expand All @@ -53,5 +60,5 @@ export async function main(argv: string[]): Promise<void> {
if (!result.ok) {
die(result.error);
}
await writeOutput(process.cwd(), args.out, result.output);
await writeOutput(cwd, args.out, result.output);
}
3 changes: 3 additions & 0 deletions src/cli/parse-args.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export interface CliArgs {
config: string | undefined;
extras: string[];
baseDir: string | undefined;
cwd: string | undefined;
json: boolean;
files: boolean;
help: boolean;
Expand All @@ -28,6 +29,7 @@ export function parseCliArgs(argv: string[]): CliArgs {
"print-schema": { type: "boolean", default: false },
out: { type: "string", short: "o" },
"log-level": { type: "string", short: "l" },
cwd: { type: "string" },
},
});

Expand All @@ -42,6 +44,7 @@ export function parseCliArgs(argv: string[]): CliArgs {
config: values.config as string | undefined,
extras: (values.extra as string[] | undefined) ?? [],
baseDir: values["base-dir"] as string | undefined,
cwd: values.cwd as string | undefined,
json: values.json === true,
files: values.files === true,
help: values.help === true,
Expand Down
1 change: 1 addition & 0 deletions src/cli/usage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export const USAGE = `Usage:
Options:
-c, --config <path> Path to config file (default: hashup.json)
-e, --extra <file> Extra file to include (repeatable, single-file mode)
--cwd <dir> Run as if from this directory (default: process.cwd())
-b, --base-dir <dir> Base directory for resolution (default: cwd)
--json Output JSON instead of plain text
--files Include resolved file list in JSON output
Expand Down
14 changes: 8 additions & 6 deletions src/lib/cache.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,16 @@
* computation (the file's content hash is recomputed at most once).
*
* Two parallel maps keyed by absolute file path:
* - `hashes`: the flattened hash list (self + transitive deps).
* Returned directly to callers and combined into the final digest.
* - `deps`: the file's direct resolved dependency paths. Used by
* `collectReachable` to rebuild the per-call file list without
* re-walking the graph.
* - `hashes`: the file's own content hash (sha256 of its bytes).
* One 64-char string per file — not a flattened transitive list,
* because that was O(files × avg closure) and blew out the heap
* on large monorepos. See `hashup()` for how the transitive
* contribution is reconstructed at combine time.
* - `deps`: the file's direct resolved dependency paths. Walked by
* `collectReachable` to enumerate the transitive closure.
*/
export interface HashupCache {
hashes: Map<string, string[]>;
hashes: Map<string, string>;
deps: Map<string, string[]>;
}

Expand Down
41 changes: 21 additions & 20 deletions src/lib/hash-file.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,54 +4,58 @@ import { createContentHash } from "./create-content-hash.js";
import { extractImports } from "./extract-imports.js";
import { isInNodeModules } from "./is-in-node-modules.js";
import { createLogger, type Logger } from "./logger.js";
import { pushAll } from "./push-all.js";
import { readFileContent } from "./read-file-content.js";
import { resolveImport } from "./resolve-import.js";

/**
* Ensure `file` and every file reachable from it are present in the
* cache. Returns the file's own content hash (sha256 hex) on success,
* or `null` if the file could not be read or parsed — in which case
* callers should skip it. The transitive contribution is reconstructed
* at combine time by walking `cache.deps`.
*
* Terminates deterministically on circular imports: the cache entry is
* seeded with the self hash before recursing, so a cycle A → B → A
* short-circuits on the revisit.
*/
export async function hashFile(
file: string,
cache: HashupCache,
resolver: Resolver,
logger: Logger = createLogger("silent"),
): Promise<string[]> {
): Promise<string | null> {
const cached = cache.hashes.get(file);
if (cached) {
if (cached !== undefined) {
return cached;
}

try {
const content = await readFileContent(file);
const hashes = [createContentHash(content)];
const selfHash = createContentHash(content);
const deps: string[] = [];
// Seed both caches before recursing so circular imports terminate:
// on a cycle A → B → A, the revisit of A hits `cache.hashes` and
// returns the placeholder instead of walking forever.
cache.hashes.set(file, hashes);
cache.hashes.set(file, selfHash);
cache.deps.set(file, deps);

const imports = await extractImports(file, content);
const dependencyHashes = await hashDependencies(imports, file, cache, resolver, logger, deps);
pushAll(hashes, dependencyHashes);
await walkDependencies(imports, file, cache, resolver, logger, deps);

return hashes;
return selfHash;
} catch (error) {
logger.warn(`Failed to hash file ${file}:`, error);
cache.hashes.delete(file);
cache.deps.delete(file);
return [];
return null;
}
}

async function hashDependencies(
async function walkDependencies(
imports: string[],
sourceFile: string,
cache: HashupCache,
resolver: Resolver,
logger: Logger,
deps: string[],
): Promise<string[]> {
const hashes: string[] = [];

): Promise<void> {
for (const imported of imports) {
const resolved = await resolveImport(resolver, sourceFile, imported);
if (!resolved) continue;
Expand All @@ -65,9 +69,6 @@ async function hashDependencies(
continue;
}
deps.push(resolved);
const resolvedHashes = await hashFile(resolved, cache, resolver, logger);
pushAll(hashes, resolvedHashes);
await hashFile(resolved, cache, resolver, logger);
}

return hashes;
}
Loading
Loading