fix: make LCM inline content retrievable instead of producing file-not-found errors#6
Conversation
When large tool outputs were stored inline in LCM, metadata labels like "tool_output_tasks_toolu_*" were saved as original_path in the database. This caused the AI to attempt reading them as files, resolving against the project CWD and producing "File not found" errors. Set original_path to null for inline content, clarify lcm_describe output for inline entries, and remove misleading Read tool instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
There was a problem hiding this comment.
whats the effect of this being null?
There was a problem hiding this comment.
Good question — nulling original_path was the initial fix but it was too blunt. The real issue was that LCM had no way to distinguish inline content (stored in PostgreSQL) from path-based content (stored on disk). Setting the path to null stopped the model from trying to Read a nonsense path, but it also left the model with no way to retrieve the content at all.
This PR supersedes that approach: we added a storage_kind column (path | inline_text | inline_binary) as a proper discriminant, plus an lcm_read tool that lets the model retrieve stored content by file ID regardless of storage kind. So original_path is no longer nulled — it's just not used for inline content routing anymore.
- Add a storage_kind model for large_files (path, inline_text, inline_binary) and migrate existing rows safely. - Relax original_path nullability for inline payloads and enforce row shape via a storage constraint. - Update insert/retrieval paths to write and read by storage_kind instead of fake path semantics. - Improve lcm_describe output to show storage mode explicitly and avoid path confusion. - Update large-user-text integration expectations for inline storage metadata.
- Large tool outputs and pasted user text stored in LCM had no retrieval path exposed to the model. The model would try Read (a filesystem tool) with a database ID, get "file not found", and re-analyze from scratch, burning tokens. - lcm_read calls the existing getLargeFileContent internal function and returns the actual stored payload (inline DB text or disk-backed file). - Sub-agent gated (same pattern as lcm_expand) to keep main context lean. - Added lcm_read to explore agent permission allowlist. - Updated reference messages in large-tool-output, lcm-describe, and lcm-expand to direct the model toward lcm_read via Task sub-agents. - Added TUI render component and hidden-tool registration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lcm_read retrieved content correctly from the LCM database, but the processor's large-output handler immediately re-ingested it (>10k tokens) and replaced it with a 2k reference stub. The model never saw the content no matter how many times it called lcm_read. - lcm_read now sets metadata.lcm.storedInLcm on successful retrieval - processor skips handleLargeToolOutput when content already came from LCM - Updated all guidance messages to direct models to explore sub-agents (which cannot spawn tasks) instead of general sub-agents (which recurse) Verified end-to-end: tasks tool output stored as inline_text, explore sub-agent called lcm_read, got full 50,934 bytes back, no re-storage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Epic volt-156: PR review fixes for lcm-inline-content-path-resolution - 9 subtasks covering DRY violations, type safety, migration atomicity, backward-compat removal, multi-tenant schema gap, and inline_binary handling - volt-640 blocked on volt-d7e (dependency) - volt-44e (upgrade revert) assigned to coordinator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…all script" This reverts commit 70a5534.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove the unused storageKind property from LcmReadMetadata\n- Keep the metadata interface aligned with actual lcm_read payloads
- Add a .max(100_000_000) guard to the max_bytes Zod schema\n- Keep default byte behavior unchanged while enforcing upper limit
- Add an explicit inline_binary branch in getLargeFileContent\n- Use getLargeFile in lcm_read to distinguish binary vs missing path\n- Return a binary-specific user message instead of a moved-file hint
- Bun 1.3.6 has a transpiler bug where adding code to a large namespace
file can cause `require()` to become undefined in sibling functions
- Replaced both require("crypto") calls in generateFileId and
generateBinaryFileId with a top-level `import { createHash } from "crypto"`
- This is the correct ESM pattern regardless of the Bun bug
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
What
When LCM stores large tool outputs inline (in PostgreSQL rather than on disk), the tool output label (e.g.
tool_output_tasks_toolu_019gvY64AV8m8iDyAiyUTkY4) was being stored asoriginal_path. The model then tried toReadthat label as a file path, hitting "File not found" errors and losing access to the stored content entirely.This PR introduces
storage_kindas a first-class discriminant on LCM'slarge_filestable, adds anlcm_readtool so the model can retrieve stored content by file ID, and hardens the surrounding infrastructure.Why
Users were seeing the model fail to read its own stored context — it would
lcm_describea file, get back what looked like a path, thenReadit and get a file-not-found error. The root cause was that inline content had no way to distinguish itself from path-based content, so the model treated everything as disk files.Changes
storage_kindcolumn (path|inline_text|inline_binary) tolarge_fileswith full migrationlcm_readtool for retrieving stored LCM content by file IDstorage_kindmigration to per-user tenant schemasLcmToolMetadatainterface for lcm_read ↔ processor communicationlcm_readoutput back into LCMrequire("crypto")with ESM import (Bun transpiler workaround)lcm_readmax_bytes at 100MBTesting
bun typecheckpasses across all 12 packagesgenerateFileIdandgenerateBinaryFileIdproduce correct hashes at runtimelcm_readwithout "File not found" errors