Conversation
- Expand user messages to contain content blocks instead of strings - Move the shared buffers into the content block of the first user message - Sort buffers by modification time into old (>2min) and recent (>2min) - Apply cache_control break points to old buffers, new buffers, and the most recent conversation message - Update cost reporting message with pricing for new cached prompts
- extract pricing and usage categories into a config var
- Previous version broke tool results and diffs
|
Oh, nice ideas there! I'll test the bedrock mode once I get a moment. In the example https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching the context is part of the system prompt, but you have it as part of messages (and also inflate all other messages to content blocks) - does it matter? (Keeping it as part of the system prompt would feel a bit cleaner to me, unless there's a reason not to.) What about instead of busy/quiet buffer distinction simply sorting buffers by lastmodified? |
I'm trying to remember. I think I tried to use the system prompt first. But the system prompt needs to be inflated in order to apply the cache breakpoints. Later, it's passed as a string to the bedrock helper and when I tried to modify the bedrock helper I got scared. I'm not sure if it makes a difference. It's hard to understand from the anthropic documentation whether it really matters -- different examples show both.
I inflate all user messages to content blocks for the same reason. The most recent user message needs to have a cache-breakpoint. Actually, the multi-turn example says that the second-last user message also needs one, but it seems to work fine without.
I'm really not sure what is best here. I think sorting alone is not sufficient. The problem is we only have 5 cache breakpoints, and I'm using one for the final message. Even if we sort, we still need to arbitrarily decide where to put the cache breakpoints and how many to use. I decided on 2 breakpoints: one for quiet buffers (documentation / background code) and one for busy buffers that claude will edit. It might be helpful to go one more step and separate the single most recently edited buffer into its own block. It's hard to optimize this stuff for unknown use cases! Mostly I think it is useful to cache documentation and read-only source files data that claude will not edit. |
This commit adds prompt caching to claude.nvim.
Hope this is helpful.