Adding CountToken to Gemini #2137

kauabh · 2025-07-05T23:15:57Z

Gemini Provides an endpoint to count tokens https://ai.google.dev/api/tokens#method:-models.counttokens.
I think it will be useful and address some concerns in this issue #1794 (at least for gemini).

@DouweM Wanted to check if this will be helpful. If yes and if the approach is right, wanted to know if you can share some pointers around adding it in usage_limits for gemini. Happy to work on other models too, if this one make it through.

Gemini Provides an endpoint to count token before sending an response https://ai.google.dev/api/tokens#method:-models.counttokens

added type adaptor

Removed extra assignment

Linting

Removed White Space

DouweM · 2025-07-07T16:44:12Z

@kauabh I agree that if a model API has a method to count tokens, it would be nice to expose that on the Model class.

But I don't think we should automatically use it when UsageLimits(request_tokens_limit=...) is used, as it adds an extra request and the overhead and latency that comes with that, unlike OpenAI's tiktoken which was mentioned in #1794 and can be run locally. So if we'd like to give users the option to better enforce request_tokens_limit by doing a separate count-tokens request ahead of the actual LLM request, that should be opt-in with some flag on UsageLimits and appropriate warnings in the docs about the extra overhead.

That check would need to be implemented here, just before we call model.request, once we have the messages, model settings, and model request params ready:

pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py

Lines 379 to 393 in b31c77d

    
           async def _make_request( 
        
               self, ctx: GraphRunContext[GraphAgentState, GraphAgentDeps[DepsT, NodeRunEndT]] 
        
           ) -> CallToolsNode[DepsT, NodeRunEndT]: 
        
               if self._result is not None: 
        
                   return self._result  # pragma: no cover 
        
               model_settings, model_request_parameters = await self._prepare_request(ctx) 
        
               model_request_parameters = ctx.deps.model.customize_request_parameters(model_request_parameters) 
        
               message_history = await _process_message_history( 
        
                   ctx.state.message_history, ctx.deps.history_processors, build_run_context(ctx) 
        
               ) 
        
               model_response = await ctx.deps.model.request(message_history, model_settings, model_request_parameters) 
        
               ctx.state.usage.incr(_usage.Usage()) 
        
               return self._finish_handling(ctx, model_response)

This would require a method that exists on every model, so it'd be implemented as an abstract method on the base Model class with a default implementation of raise NotImplementedError(...), and only models that have a count-tokens method would override it with a concrete implementation.

As for that concrete implementation, I recommend adding it to GoogleModel instead of GeminiModel, as you can directly use the google-genai library there, and reducing the duplication with the request-preparation logic in _generate_content as much as possible.

kauabh · 2025-07-08T14:33:35Z

@DouweM make sense, let me rework on this. Thanks for detailed input, appreciate your time

* adding count token for google * resolved conflicts

kauabh · 2025-07-25T09:50:32Z

Hey @DouweM I have made changes as per comments, looks like quite a few files got touched, It will would be great if you can provide some feedback on the changes till now. Also if you can share some thoughts on changing "instrumented.py" with count_tokens

DouweM

@kauabh Thanks! We're almost there :)

DouweM · 2025-07-29T16:16:02Z

pydantic_ai_slim/pydantic_ai/models/__init__.py

@@ -382,6 +390,14 @@ async def request(
        """Make a request to the model."""
        raise NotImplementedError()

+    @abstractmethod


Since this method is not required to be implemented, we can do the same thing we do in request_stream, meaning not mark it as @abstractmethod (so we can drop all the empty implementations from the model classes) and put the Token counting is not supported by <X> error message here.

DouweM · 2025-07-29T16:16:06Z

pydantic_ai_slim/pydantic_ai/models/__init__.py

+        self,
+        messages: list[ModelMessage],
+    ) -> BaseCountTokensResponse:
+        """Make a request to the model."""


This needs an update!

DouweM · 2025-07-29T16:17:33Z

pydantic_ai_slim/pydantic_ai/models/fallback.py

@@ -77,6 +77,13 @@ async def request(

        raise FallbackExceptionGroup('All models from FallbackModel failed', exceptions)

+    async def count_tokens(


We'll want to forward this to the models in question

DouweM · 2025-07-29T16:19:26Z

pydantic_ai_slim/pydantic_ai/models/gemini.py

+        messages: list[ModelMessage],
+    ) -> BaseCountTokensResponse:
+        """Token counting is not supported by the CohereModel."""
+        raise NotImplementedError('Token counting is not supported by CohereModel')


Wrong model name, but should be fixed by making the super method non-abstract and dropping the definition here

DouweM · 2025-07-29T16:20:38Z

pydantic_ai_slim/pydantic_ai/models/google.py

+        _, contents = await self._map_messages(messages)
+        response = self.client.models.count_tokens(
+            model=self._model_name,
+            contents=contents,


We should include not just the messages but the entire generateContentRequest as function definitions etc also count as tokens

DouweM · 2025-07-29T16:21:18Z

pydantic_ai_slim/pydantic_ai/messages.py

+
+
+@dataclass(repr=False)
+class BaseCountTokensResponse:


Do we need this entire response? Or could count_tokens just return the total tokens counted, or even better: a Usage object we can use with Usage.incr?

DouweM · 2025-07-29T16:24:50Z

pydantic_ai_slim/pydantic_ai/_agent_graph.py

+        if ctx.deps.usage_limits and ctx.deps.usage_limits.pre_request_token_check_with_overhead:
+            token_count = await ctx.deps.model.count_tokens(message_history)
+
+            ctx.deps.usage_limits.check_tokens(_usage.Usage(request_tokens=token_count.total_tokens))


I think we should use check_before_request instead of check_tokens.

Also, I think this should be in prepare_request where we currently call check_before_request. I think likely move the model_request_parameters and message_history stuff there as well, and reduce the duplication between this method and _stream.

Also, instead of checking if this particular request's total tokens exceeded the limit, shouldn't we check all the tokens so far plus the newly counted tokens? That'd be consistent with what _finish_handling currently does:

ctx.state.usage.incr(response.usage) if ctx.deps.usage_limits: # pragma: no branch ctx.deps.usage_limits.check_tokens(ctx.state.usage)

So we'd want to copy ctx.state.usage, call incr with the new usage, and then run the check against that.

DouweM · 2025-07-29T16:30:23Z

pydantic_ai_slim/pydantic_ai/usage.py

@@ -96,6 +96,10 @@ class UsageLimits:
    """The maximum number of tokens allowed in responses from the model."""
    total_tokens_limit: int | None = None
    """The maximum number of tokens allowed in requests and responses combined."""
+    pre_request_token_check_with_overhead: bool = False


Let's call it count_tokens_before_request, and clarify the description slightly to say this typically requires an API call.

DouweM · 2025-07-29T16:30:49Z

tests/models/test_google.py

+        ),
+    ]
+    result = await model.count_tokens(messages)
+    assert result.total_tokens == snapshot(7)


We'll want to test not just the count_tokens method but the actual usage limit enforcement!

kauabh added 9 commits July 6, 2025 04:27

Adding CountToken to Gemini

6f86735

Gemini Provides an endpoint to count token before sending an response https://ai.google.dev/api/tokens#method:-models.counttokens

Update gemini.py

5cd88e0

added type adaptor

Update gemini.py

a302345

Removed extra assignment

Update gemini.py

3b2e26a

Linting

Update gemini.py

dc4d29b

Linting

Update gemini.py

16f18dc

Update gemini.py

24d6c25

Update gemini.py

90fc8bb

Linting

Update gemini.py

2bfc8d0

Removed White Space

DouweM self-assigned this Jul 7, 2025

DouweM added the awaiting author revision label Jul 7, 2025

kauabh added 6 commits July 16, 2025 17:53

Merge branch 'pydantic:main' into patch-2

8be5932

Merge branch 'pydantic:main' into patch-2

c644a5e

Merge branch 'pydantic:main' into patch-2

e07c989

Enabling Request Token Count in Google (#1)

bae4ca9

* adding count token for google * resolved conflicts

removed extra argument

72c8125

updated gemini, removed redundant code

fa9de61

DouweM requested changes Jul 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding CountToken to Gemini #2137

Adding CountToken to Gemini #2137

Uh oh!

kauabh commented Jul 5, 2025

Uh oh!

DouweM commented Jul 7, 2025

Uh oh!

kauabh commented Jul 8, 2025

Uh oh!

kauabh commented Jul 25, 2025

Uh oh!

DouweM left a comment

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

DouweM Jul 29, 2025

Uh oh!

Uh oh!

		@@ -77,6 +77,13 @@ async def request(

		raise FallbackExceptionGroup('All models from FallbackModel failed', exceptions)

		async def count_tokens(

Adding CountToken to Gemini #2137

Are you sure you want to change the base?

Adding CountToken to Gemini #2137

Uh oh!

Conversation

kauabh commented Jul 5, 2025

Uh oh!

DouweM commented Jul 7, 2025

Uh oh!

kauabh commented Jul 8, 2025

Uh oh!

kauabh commented Jul 25, 2025

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!