fix: make LCM inline content retrievable instead of producing file-not-found errors by belisarius222 · Pull Request #6 · Martian-Engineering/volt

belisarius222 · 2026-02-25T04:50:37Z

What

When LCM stores large tool outputs inline (in PostgreSQL rather than on disk), the tool output label (e.g. tool_output_tasks_toolu_019gvY64AV8m8iDyAiyUTkY4) was being stored as original_path. The model then tried to Read that label as a file path, hitting "File not found" errors and losing access to the stored content entirely.

This PR introduces storage_kind as a first-class discriminant on LCM's large_files table, adds an lcm_read tool so the model can retrieve stored content by file ID, and hardens the surrounding infrastructure.

Why

Users were seeing the model fail to read its own stored context — it would lcm_describe a file, get back what looked like a path, then Read it and get a file-not-found error. The root cause was that inline content had no way to distinguish itself from path-based content, so the model treated everything as disk files.

Changes

Add storage_kind column (path | inline_text | inline_binary) to large_files with full migration
Add lcm_read tool for retrieving stored LCM content by file ID
Mirror storage_kind migration to per-user tenant schemas
Make check-constraint migration atomic and idempotent
Add typed LcmToolMetadata interface for lcm_read ↔ processor communication
Guard against processor re-storing lcm_read output back into LCM
Replace CJS require("crypto") with ESM import (Bun transpiler workaround)
Cap lcm_read max_bytes at 100MB

Testing

bun typecheck passes across all 12 packages
generateFileId and generateBinaryFileId produce correct hashes at runtime
Inline content round-trips through lcm_read without "File not found" errors

When large tool outputs were stored inline in LCM, metadata labels like "tool_output_tasks_toolu_*" were saved as original_path in the database. This caused the AI to attempt reading them as files, resolving against the project CWD and producing "File not found" errors. Set original_path to null for inline content, clarify lcm_describe output for inline entries, and remove misleading Read tool instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-25T04:50:47Z

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

Open an issue describing the bug/feature (if one doesn't exist)
Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

vishaltandale00 · 2026-02-25T05:02:00Z

packages/voltcode/src/session/lcm/db.ts

whats the effect of this being null?

Good question — nulling original_path was the initial fix but it was too blunt. The real issue was that LCM had no way to distinguish inline content (stored in PostgreSQL) from path-based content (stored on disk). Setting the path to null stopped the model from trying to Read a nonsense path, but it also left the model with no way to retrieve the content at all.

This PR supersedes that approach: we added a storage_kind column (path | inline_text | inline_binary) as a proper discriminant, plus an lcm_read tool that lets the model retrieve stored content by file ID regardless of storage kind. So original_path is no longer nulled — it's just not used for inline content routing anymore.

- Add a storage_kind model for large_files (path, inline_text, inline_binary) and migrate existing rows safely. - Relax original_path nullability for inline payloads and enforce row shape via a storage constraint. - Update insert/retrieval paths to write and read by storage_kind instead of fake path semantics. - Improve lcm_describe output to show storage mode explicitly and avoid path confusion. - Update large-user-text integration expectations for inline storage metadata.

- Large tool outputs and pasted user text stored in LCM had no retrieval path exposed to the model. The model would try Read (a filesystem tool) with a database ID, get "file not found", and re-analyze from scratch, burning tokens. - lcm_read calls the existing getLargeFileContent internal function and returns the actual stored payload (inline DB text or disk-backed file). - Sub-agent gated (same pattern as lcm_expand) to keep main context lean. - Added lcm_read to explore agent permission allowlist. - Updated reference messages in large-tool-output, lcm-describe, and lcm-expand to direct the model toward lcm_read via Task sub-agents. - Added TUI render component and hidden-tool registration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lcm_read retrieved content correctly from the LCM database, but the processor's large-output handler immediately re-ingested it (>10k tokens) and replaced it with a 2k reference stub. The model never saw the content no matter how many times it called lcm_read. - lcm_read now sets metadata.lcm.storedInLcm on successful retrieval - processor skips handleLargeToolOutput when content already came from LCM - Updated all guidance messages to direct models to explore sub-agents (which cannot spawn tasks) instead of general sub-agents (which recurse) Verified end-to-end: tasks tool output stored as inline_text, explore sub-agent called lcm_read, got full 50,934 bytes back, no re-storage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Epic volt-156: PR review fixes for lcm-inline-content-path-resolution - 9 subtasks covering DRY violations, type safety, migration atomicity, backward-compat removal, multi-tenant schema gap, and inline_binary handling - volt-640 blocked on volt-d7e (dependency) - volt-44e (upgrade revert) assigned to coordinator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…all script" This reverts commit 70a5534.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ssor

- Remove the unused storageKind property from LcmReadMetadata\n- Keep the metadata interface aligned with actual lcm_read payloads

- Add a .max(100_000_000) guard to the max_bytes Zod schema\n- Keep default byte behavior unchanged while enforcing upper limit

- Add an explicit inline_binary branch in getLargeFileContent\n- Use getLargeFile in lcm_read to distinguish binary vs missing path\n- Return a binary-specific user message instead of a moved-file hint

- Bun 1.3.6 has a transpiler bug where adding code to a large namespace file can cause `require()` to become undefined in sibling functions - Replaced both require("crypto") calls in generateFileId and generateBinaryFileId with a top-level `import { createHash } from "crypto"` - This is the correct ESM pattern regardless of the Bun bug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added the needs:issue label Feb 25, 2026

vishaltandale00 reviewed Feb 25, 2026

View reviewed changes

vishaltandale00 approved these changes Feb 25, 2026

View reviewed changes

rabsef-bicrym and others added 17 commits February 25, 2026 12:49

Revert "refactor: replace upstream upgrade sources with voltropy inst…

b9e12ed

…all script" This reverts commit 70a5534.

bump: track upgrade refactor restoration as volt-da4

92ac39a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

volt-156.1: DRY LCM internal tools constant in session TUI

2d3a293

bump: track LongMemEval benchmark as volt-pebble

da5541b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

volt-156.4: remove legacy getLargeFileContent fallback

64d2f17

volt-156.5: make large_files check-constraint migration atomic

3b50403

volt-156.5: skip redundant large_files check rewrites

0be0e1a

volt-156.3: type metadata.lcm escape hatch between lcm_read and proce…

64383c7

…ssor

volt-156.8: add storage_kind migration for tenant schemas

1981723

volt-156.2: remove dead storageKind field from lcm_read metadata

57d2d88

- Remove the unused storageKind property from LcmReadMetadata\n- Keep the metadata interface aligned with actual lcm_read payloads

volt-156.7: cap lcm_read max_bytes schema at 100MB

5ad77e0

- Add a .max(100_000_000) guard to the max_bytes Zod schema\n- Keep default byte behavior unchanged while enforcing upper limit

volt-156.9: handle inline_binary reads explicitly

2cb9210

- Add an explicit inline_binary branch in getLargeFileContent\n- Use getLargeFile in lcm_read to distinguish binary vs missing path\n- Return a binary-specific user message instead of a moved-file hint

rabsef-bicrym changed the title ~~fix: prevent LCM inline content labels from being treated as file paths~~ fix: make LCM inline content retrievable instead of producing file-not-found errors Feb 26, 2026

rabsef-bicrym merged commit f93ed61 into dev Feb 26, 2026
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make LCM inline content retrievable instead of producing file-not-found errors#6

fix: make LCM inline content retrievable instead of producing file-not-found errors#6
rabsef-bicrym merged 18 commits intodevfrom
fix/lcm-inline-content-path-resolution

belisarius222 commented Feb 25, 2026 •

edited by rabsef-bicrym

Loading

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

vishaltandale00 Feb 25, 2026

Uh oh!

rabsef-bicrym Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

belisarius222 commented Feb 25, 2026 • edited by rabsef-bicrym Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes

Testing

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

vishaltandale00 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

rabsef-bicrym Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

belisarius222 commented Feb 25, 2026 •

edited by rabsef-bicrym

Loading