Skip to content

fix: make LCM inline content retrievable instead of producing file-not-found errors#6

Merged
rabsef-bicrym merged 18 commits intodevfrom
fix/lcm-inline-content-path-resolution
Feb 26, 2026
Merged

fix: make LCM inline content retrievable instead of producing file-not-found errors#6
rabsef-bicrym merged 18 commits intodevfrom
fix/lcm-inline-content-path-resolution

Conversation

@belisarius222
Copy link
Contributor

@belisarius222 belisarius222 commented Feb 25, 2026

What

When LCM stores large tool outputs inline (in PostgreSQL rather than on disk), the tool output label (e.g. tool_output_tasks_toolu_019gvY64AV8m8iDyAiyUTkY4) was being stored as original_path. The model then tried to Read that label as a file path, hitting "File not found" errors and losing access to the stored content entirely.

This PR introduces storage_kind as a first-class discriminant on LCM's large_files table, adds an lcm_read tool so the model can retrieve stored content by file ID, and hardens the surrounding infrastructure.

Why

Users were seeing the model fail to read its own stored context — it would lcm_describe a file, get back what looked like a path, then Read it and get a file-not-found error. The root cause was that inline content had no way to distinguish itself from path-based content, so the model treated everything as disk files.

Changes

  • Add storage_kind column (path | inline_text | inline_binary) to large_files with full migration
  • Add lcm_read tool for retrieving stored LCM content by file ID
  • Mirror storage_kind migration to per-user tenant schemas
  • Make check-constraint migration atomic and idempotent
  • Add typed LcmToolMetadata interface for lcm_read ↔ processor communication
  • Guard against processor re-storing lcm_read output back into LCM
  • Replace CJS require("crypto") with ESM import (Bun transpiler workaround)
  • Cap lcm_read max_bytes at 100MB

Testing

  • bun typecheck passes across all 12 packages
  • generateFileId and generateBinaryFileId produce correct hashes at runtime
  • Inline content round-trips through lcm_read without "File not found" errors
image

When large tool outputs were stored inline in LCM, metadata labels like
"tool_output_tasks_toolu_*" were saved as original_path in the database.
This caused the AI to attempt reading them as files, resolving against
the project CWD and producing "File not found" errors.

Set original_path to null for inline content, clarify lcm_describe
output for inline entries, and remove misleading Read tool instructions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the effect of this being null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — nulling original_path was the initial fix but it was too blunt. The real issue was that LCM had no way to distinguish inline content (stored in PostgreSQL) from path-based content (stored on disk). Setting the path to null stopped the model from trying to Read a nonsense path, but it also left the model with no way to retrieve the content at all.

This PR supersedes that approach: we added a storage_kind column (path | inline_text | inline_binary) as a proper discriminant, plus an lcm_read tool that lets the model retrieve stored content by file ID regardless of storage kind. So original_path is no longer nulled — it's just not used for inline content routing anymore.

rabsef-bicrym and others added 17 commits February 25, 2026 12:49
- Add a storage_kind model for large_files (path, inline_text, inline_binary) and migrate existing rows safely.

- Relax original_path nullability for inline payloads and enforce row shape via a storage constraint.

- Update insert/retrieval paths to write and read by storage_kind instead of fake path semantics.

- Improve lcm_describe output to show storage mode explicitly and avoid path confusion.

- Update large-user-text integration expectations for inline storage metadata.
- Large tool outputs and pasted user text stored in LCM had no retrieval
  path exposed to the model. The model would try Read (a filesystem tool)
  with a database ID, get "file not found", and re-analyze from scratch,
  burning tokens.
- lcm_read calls the existing getLargeFileContent internal function and
  returns the actual stored payload (inline DB text or disk-backed file).
- Sub-agent gated (same pattern as lcm_expand) to keep main context lean.
- Added lcm_read to explore agent permission allowlist.
- Updated reference messages in large-tool-output, lcm-describe, and
  lcm-expand to direct the model toward lcm_read via Task sub-agents.
- Added TUI render component and hidden-tool registration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lcm_read retrieved content correctly from the LCM database, but the
processor's large-output handler immediately re-ingested it (>10k tokens)
and replaced it with a 2k reference stub. The model never saw the content
no matter how many times it called lcm_read.

- lcm_read now sets metadata.lcm.storedInLcm on successful retrieval
- processor skips handleLargeToolOutput when content already came from LCM
- Updated all guidance messages to direct models to explore sub-agents
  (which cannot spawn tasks) instead of general sub-agents (which recurse)

Verified end-to-end: tasks tool output stored as inline_text, explore
sub-agent called lcm_read, got full 50,934 bytes back, no re-storage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Epic volt-156: PR review fixes for lcm-inline-content-path-resolution
- 9 subtasks covering DRY violations, type safety, migration atomicity,
  backward-compat removal, multi-tenant schema gap, and inline_binary handling
- volt-640 blocked on volt-d7e (dependency)
- volt-44e (upgrade revert) assigned to coordinator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove the unused storageKind property from LcmReadMetadata\n- Keep the metadata interface aligned with actual lcm_read payloads
- Add a .max(100_000_000) guard to the max_bytes Zod schema\n- Keep default byte behavior unchanged while enforcing upper limit
- Add an explicit inline_binary branch in getLargeFileContent\n- Use getLargeFile in lcm_read to distinguish binary vs missing path\n- Return a binary-specific user message instead of a moved-file hint
- Bun 1.3.6 has a transpiler bug where adding code to a large namespace
  file can cause `require()` to become undefined in sibling functions
- Replaced both require("crypto") calls in generateFileId and
  generateBinaryFileId with a top-level `import { createHash } from "crypto"`
- This is the correct ESM pattern regardless of the Bun bug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rabsef-bicrym rabsef-bicrym changed the title fix: prevent LCM inline content labels from being treated as file paths fix: make LCM inline content retrievable instead of producing file-not-found errors Feb 26, 2026
@rabsef-bicrym rabsef-bicrym merged commit f93ed61 into dev Feb 26, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants