-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Feat/anthropic extended ttl #6205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Added to `cacheBehaviorSchema` extra optional parameters. ``` useExtendedCacheTtlBeta: z.boolean().optional(), cacheTtl: z.enum(["5m", "1h"]).optional(), ```
👷 Deploy request for continuedev pending review.Visit the deploys page to approve it
|
I have read the CLA Document and I hereby sign the CLA 1 out of 2 committers have signed the CLA. |
😱 Found 3 issues. Time to roll up your sleeves! 😱 |
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR as far as the code goes. I kind of want to step back though to better understand whether you think this could be a sensible default rather than a configuration option. I'm weary of too many options and if everyone would benefit from the way you are configuring your Anthropic models, maybe we should just ship that as the default (I repeated all this in a comment below)
Agreed @chezsmithy ! Thanks for linking the PR here, that what I had in mind |
@chezsmithy thanks for link, i will take a look and will re-align my PR |
Due some time constraint this week i have, will work thru PR closer to the end of week or over weekend. |
Sounds good! |
@chezsmithy i looked into PR for AWS Bedrock, at this moment its support limited to 5min cache, what |
@sestinj, i did more simple version with only following settings
the reason, while it still in beta feature i probably keep it as enabler, plus it costs 2x tokens, by this reason I allow to disable tools messages. But if you have any suggestions, please, let me know. |
found some issues in caching logic where we can hit maximum 4 slots allowed to be cached by Anthropic, so implementing some logic to ensure priority to what cache settings will be applied |
I found my problem 😀 I made it too complicated. I will try rolling back to my previous initial code and simplifying it. I spent a couple of hours reading through all of Anthropic's documentation and examples to understand how their breakpoints work and where and how to use them. Things were much easier than I thought. So, hopefully, I will have the final code in a few days. Then, we can benefit from a close to 90-95% cache hit rate. |
@sestinj i tested current implementation and it looks good. so user can enable/disable tools/system/conversation cache breakpoints, and control 5 min/1 hour TTL when enabling Beta feature. Also added cacheDebug, it helps alot to understand what going on inside and how things cached and its efficiency. |
i found also problem with CLA, looks like it will be more cleaner if i re-do entirely PR and apply my changes to latest main branch instead of trying to solve the puzzles with CLA and merge conflicts. |
Seems for best efficient cost/cache ratio, Apply/Edit models better to disable cacheConversations, because apply/edit does not include history. while it make sense, i not sure why rules / system messages stripped out as well (i would prefer to include them to ensure that during edit/apply AI follow the same guidelines as during conversation), but the main point need to disable conersation caching for Edit/Apply to not waste $$$ and pay just for vanilla costs, with caching single input message we literally wasting x1.25 or x2.0 depends on cache type which written to cache but never used after since each edit/apply a new mini-session. In any case i think i did the best from what i can to achieve best caching logic and support Anthropic extended ttl. It works perfect. Small issue with 1h ttl that it has strictly limited TTL and not using LRU style, so its invalidation will always happen after 1hour. It still good option anyway. I have idea maybe to implement sort of "ping" where run timer after last message and automatically send context +ping which will cost few tokens only but will refresh 5min ttl. The problem here it needs more testing and investigation about how long we can keep 5min TTL using ping approach and also have some fail-safe feature to not abuse anthropic functionality where user can forgot to close session and ping can stay for hours and days. So this is only idea. Also code tested against current main branch with v1.1.56 of plugin. unfortunately i cant do CLA from another account, so maybe it easier to create new PR with rebased from current main and without leftish committers :D |
Description
Implements granular per-message-type caching for Anthropic models to improve token efficiency in Agent mode. Adds new
CacheBehavior
options to specify how many of each message type to cache (user messages, tool results, assistant tool calls, etc.) instead of only caching the last 2 user messages.This is related to issue #6135
Checklist
Screenshots
N/A - Backend caching enhancement with no visual changes.
Tests
Added comprehensive test suite
core/llm/llms/Anthropic.enhanced-caching.test.ts
with 6 test cases covering:shouldCacheMessage
logicAll tests pass and validate the new per-type caching functionality while maintaining backward compatibility.