Feat/anthropic extended ttl #6205

md2k · 2025-06-19T22:54:55Z

Description

Implements granular per-message-type caching for Anthropic models to improve token efficiency in Agent mode. Adds new CacheBehavior options to specify how many of each message type to cache (user messages, tool results, assistant tool calls, etc.) instead of only caching the last 2 user messages.
This is related to issue #6135

Checklist

I've read the contributing guide
The relevant docs, if any, have been updated or created
The relevant tests, if any, have been updated or created

Screenshots

N/A - Backend caching enhancement with no visual changes.

Tests

Added comprehensive test suite core/llm/llms/Anthropic.enhanced-caching.test.ts with 6 test cases covering:

Tool result message caching
Assistant tool call message caching
Per-type caching limits validation
Disabled caching behavior
Fallback TTL handling
Core shouldCacheMessage logic

All tests pass and validate the new per-type caching functionality while maintaining backward compatibility.

Added to `cacheBehaviorSchema` extra optional parameters. ``` useExtendedCacheTtlBeta: z.boolean().optional(), cacheTtl: z.enum(["5m", "1h"]).optional(), ```

netlify · 2025-06-19T22:54:58Z

👷 Deploy request for continuedev pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`ad24f59`

github-actions · 2025-06-19T22:55:04Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

1 out of 2 committers have signed the CLA.
✅ (md2k)[https://github.com/md2k]
❌ @Yevgen Flerko (OSV)
Yevgen Flerko (OSV) seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

core/llm/llms/Anthropic.enhanced-caching.test.ts

recurseml · 2025-06-19T22:56:32Z

😱 Found 3 issues. Time to roll up your sleeves! 😱

md2k · 2025-06-19T22:56:50Z

I have read the CLA Document and I hereby sign the CLA

md2k · 2025-06-19T23:44:51Z

Some details about how long session with big context looks with 5min cache and 1h cache and cost perspective:
5 min cache:

md2k · 2025-06-19T23:45:49Z

1h ttl

sestinj

This is a great PR as far as the code goes. I kind of want to step back though to better understand whether you think this could be a sensible default rather than a configuration option. I'm weary of too many options and if everyone would benefit from the way you are configuring your Anthropic models, maybe we should just ship that as the default (I repeated all this in a comment below)

docs/docs/customize/model-providers/top-level/anthropic.mdx

core/index.d.ts

chezsmithy · 2025-06-21T05:44:02Z

@sestinj maybe we align to this previous PR. #5371

It introduced a single caching setting that controls all the options. Whatever we do here I should likely bring to Bedrock as well.

Watching.

sestinj · 2025-06-23T03:12:00Z

Agreed @chezsmithy ! Thanks for linking the PR here, that what I had in mind

md2k · 2025-06-23T11:42:10Z

@chezsmithy thanks for link, i will take a look and will re-align my PR

md2k · 2025-06-24T11:13:11Z

Due some time constraint this week i have, will work thru PR closer to the end of week or over weekend.

sestinj · 2025-06-24T21:23:33Z

Sounds good!

md2k · 2025-07-03T19:40:07Z

@chezsmithy i looked into PR for AWS Bedrock, at this moment its support limited to 5min cache, what cachePoint: { type: "default" }, and you already have implemented system, user messages and tools. the only improvements i can see here, is to maybe extend internal logic to ensure we covering all types of messages similar to my Anthropic PR. So probably right now Bedrock not require any changes. also i need look into its api because some property names different from Anthropic.

md2k · 2025-07-03T20:07:04Z

@sestinj, i did more simple version with only following settings

        "cacheSystemMessage": true, -- original
        "cacheConversation": true,  -- original
        "cacheToolMessages": true, - new
        "useExtendedCacheTtlBeta": true,  -new
        "cacheTtl": "5m|1h" - new

the reason, while it still in beta feature i probably keep it as enabler, plus it costs 2x tokens, by this reason I allow to disable tools messages.
I going to work with this patch myself couple of days, to understand how it feels with some settings enabled/disabled and token/$$ consumption.

But if you have any suggestions, please, let me know.

md2k · 2025-07-03T21:21:06Z

found some issues in caching logic where we can hit maximum 4 slots allowed to be cached by Anthropic, so implementing some logic to ensure priority to what cache settings will be applied

md2k · 2025-07-03T23:05:38Z

while testing logic, i was able to achieve pretty nice results

520k and 620k input/cached tokens... unfortunately i screwed logic and burned around 400k input (no cache at all) and invalidated cache for 300k tokens. so real input/cache writes is around 200k/200k +- and 6.6mil cache read hits.

And costs (supposed to be cheaper, but due small mistake i burned for nothing close to 4$ due broken logic :D ):

md2k · 2025-07-03T23:11:00Z

new logic not committed yet. going to test it a bit more. Also added some debug to console. curious if possible to keep this debug or maybe make some option for stats (can be handy to see)

md2k · 2025-07-04T02:12:30Z

I found my problem 😀 I made it too complicated. I will try rolling back to my previous initial code and simplifying it. I spent a couple of hours reading through all of Anthropic's documentation and examples to understand how their breakpoints work and where and how to use them. Things were much easier than I thought. So, hopefully, I will have the final code in a few days. Then, we can benefit from a close to 90-95% cache hit rate.

md2k · 2025-07-04T12:15:18Z

@sestinj i tested current implementation and it looks good. so user can enable/disable tools/system/conversation cache breakpoints, and control 5 min/1 hour TTL when enabling Beta feature. Also added cacheDebug, it helps alot to understand what going on inside and how things cached and its efficiency.
If you do not want to keep debug, i can remove it.
And i asking for help because i can't resolve mdx conflict :(

md2k · 2025-07-04T12:24:07Z

i found also problem with CLA, looks like it will be more cleaner if i re-do entirely PR and apply my changes to latest main branch instead of trying to solve the puzzles with CLA and merge conflicts.

md2k · 2025-07-04T17:44:42Z

Seems for best efficient cost/cache ratio, Apply/Edit models better to disable cacheConversations, because apply/edit does not include history. while it make sense, i not sure why rules / system messages stripped out as well (i would prefer to include them to ensure that during edit/apply AI follow the same guidelines as during conversation), but the main point need to disable conersation caching for Edit/Apply to not waste $$$ and pay just for vanilla costs, with caching single input message we literally wasting x1.25 or x2.0 depends on cache type which written to cache but never used after since each edit/apply a new mini-session.

In any case i think i did the best from what i can to achieve best caching logic and support Anthropic extended ttl. It works perfect. Small issue with 1h ttl that it has strictly limited TTL and not using LRU style, so its invalidation will always happen after 1hour. It still good option anyway. I have idea maybe to implement sort of "ping" where run timer after last message and automatically send context +ping which will cost few tokens only but will refresh 5min ttl. The problem here it needs more testing and investigation about how long we can keep 5min TTL using ping approach and also have some fail-safe feature to not abuse anthropic functionality where user can forgot to close session and ping can stay for hours and days. So this is only idea.

Also code tested against current main branch with v1.1.56 of plugin. unfortunately i cant do CLA from another account, so maybe it easier to create new PR with rebased from current main and without leftish committers :D

md2k added 5 commits June 19, 2025 22:52

Update models.ts

09601a2

Added to `cacheBehaviorSchema` extra optional parameters. ``` useExtendedCacheTtlBeta: z.boolean().optional(), cacheTtl: z.enum(["5m", "1h"]).optional(), ```

Update index.d.ts

20d6227

Update Anthropic.ts

04a1ed3

Update anthropic.mdx

090f5df

adding support to cache other types of messages

cc8864a

md2k requested a review from a team as a code owner June 19, 2025 22:54

md2k requested review from sestinj and removed request for a team June 19, 2025 22:54

github-project-automation bot added this to Issues and PRs Jun 19, 2025

github-project-automation bot moved this to Todo in Issues and PRs Jun 19, 2025

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 19, 2025

recurseml bot reviewed Jun 19, 2025

View reviewed changes

core/llm/llms/Anthropic.enhanced-caching.test.ts Outdated Show resolved Hide resolved

recurseml bot reviewed Jun 19, 2025

View reviewed changes

core/llm/llms/Anthropic.enhanced-caching.test.ts Outdated Show resolved Hide resolved

recurseml bot reviewed Jun 19, 2025

View reviewed changes

core/llm/llms/Anthropic.enhanced-caching.test.ts Outdated Show resolved Hide resolved

md2k added 2 commits June 20, 2025 01:05

suggestions applied

a0dc44e

new lines, mdx format fix

39489c3

fix formatting

c245ad0

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 19, 2025

md2k added 2 commits June 20, 2025 02:03

test file moved to correct place

e430ab6

fixed missed new line in test file

e3140dd

sestinj reviewed Jun 21, 2025

View reviewed changes

docs/docs/customize/model-providers/top-level/anthropic.mdx Outdated Show resolved Hide resolved

core/index.d.ts Show resolved Hide resolved

simplifying caching settings.

728b318

Yevgen Flerko (OSV) and others added 2 commits July 3, 2025 21:56

prettier and docs

87bb7b3

remove extra line, and fix commiter

e80ef49

fix tests to latest implementation

04c7bb3

trying different strategies

4a231ca

md2k added 3 commits July 4, 2025 13:58

refactored caching logic, more streamlines, debug

637f562

prettier fixes

b414bac

trying to simplify conflict resolution

ad24f59

Feat/anthropic extended ttl #6205

Are you sure you want to change the base?

Feat/anthropic extended ttl #6205

Conversation

md2k commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Screenshots

Tests

Uh oh!

netlify bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👷 Deploy request for continuedev pending review.

Uh oh!

github-actions bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

recurseml bot commented Jun 19, 2025

Uh oh!

md2k commented Jun 19, 2025

Uh oh!

md2k commented Jun 19, 2025

Uh oh!

md2k commented Jun 19, 2025

Uh oh!

sestinj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chezsmithy commented Jun 21, 2025

Uh oh!

sestinj commented Jun 23, 2025

Uh oh!

md2k commented Jun 23, 2025

Uh oh!

md2k commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sestinj commented Jun 24, 2025

Uh oh!

md2k commented Jul 3, 2025

Uh oh!

md2k commented Jul 3, 2025

Uh oh!

md2k commented Jul 3, 2025

Uh oh!

md2k commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

md2k commented Jul 3, 2025

Uh oh!

md2k commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

md2k commented Jul 4, 2025

Uh oh!

md2k commented Jul 4, 2025

Uh oh!

md2k commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

md2k commented Jun 19, 2025 •

edited

Loading

netlify bot commented Jun 19, 2025 •

edited

Loading

github-actions bot commented Jun 19, 2025 •

edited

Loading

md2k commented Jun 24, 2025 •

edited

Loading

md2k commented Jul 3, 2025 •

edited

Loading

md2k commented Jul 4, 2025 •

edited

Loading

md2k commented Jul 4, 2025 •

edited

Loading