Skip to content

The LiteLLM wrapper does not propagate finish_reason from LiteLLM responses #3109

@aperepel

Description

@aperepel

Describe the bug

The LiteLLM wrapper does not propagate finish_reason from LiteLLM responses to LlmResponse.finish_reason. This makes it impossible for after_model_callback functions to detect when the max_tokens limit is hit or other completion conditions occur.

When using LiteLLM models, llm_response.finish_reason is always None, even when LiteLLM returns a valid finish_reason value (e.g., "length" for max_tokens truncation).

To Reproduce

Minimal reproduction code:

import asyncio
from google.adk import Agent, Runner
from google.adk.agents.callback_context import CallbackContext
from google.adk.models.lite_llm import LiteLlm
from google.adk.models.llm_response import LlmResponse
from google.adk.sessions import InMemorySessionService
from google.genai import types


def create_inspector():
    """Callback to capture finish_reason."""
    captured = {"finish_reason": None}

    def inspector(ctx: CallbackContext, resp: LlmResponse) -> LlmResponse:
        captured["finish_reason"] = resp.finish_reason
        return resp

    inspector.captured = captured
    return inspector


async def test():
    # Create model with low max_tokens to trigger truncation
    model = LiteLlm(
        model="gpt-3.5-turbo",
        api_key="your-key",
        max_tokens=50,  # Intentionally low
    )

    inspector = create_inspector()

    agent = Agent(
        model=model,
        name="test",
        instruction="Provide detailed explanations.",
        after_model_callback=inspector,
    )

    session_service = InMemorySessionService()
    runner = Runner(
        app_name="test",
        agent=agent,
        session_service=session_service
    )

    await session_service.create_session(
        app_name="test",
        user_id="user",
        session_id="session",
        state={},
    )

    message = types.Content(
        role="user",
        parts=[types.Part(text="Explain quantum computing in detail.")]
    )

    async for _ in runner.run_async(
        user_id="user",
        session_id="session",
        new_message=message
    ):
        pass

    print(f"finish_reason: {inspector.captured['finish_reason']}")
    # Output: finish_reason: None (BUG - should be "length")


asyncio.run(test())

Steps to reproduce:

  1. Install: pip install google-adk litellm openai
  2. Run the code above with a valid API key
  3. Observe that finish_reason is None even though LiteLLM returned "length"

Expected behavior

llm_response.finish_reason should contain the finish_reason value from LiteLLM:

  • "stop" for natural completion
  • "length" for max_tokens limit reached
  • "tool_calls" for tool invocations
  • "content_filter" for filtered content

This matches the behavior of native Gemini models, where finish_reason is properly populated.

Root Cause

In google/adk/models/lite_llm.py, the _model_response_to_generate_content_response() function (lines 473-499) extracts usage_metadata from the LiteLLM response but does not extract finish_reason:

def _model_response_to_generate_content_response(response):
    message = None
    if response.get("choices", None):
        message = response["choices"][0].get("message", None)
        # Missing: finish_reason = response["choices"][0].get("finish_reason", None)

    llm_response = _message_to_generate_content_response(message)
    # Missing: llm_response.finish_reason = finish_reason

    if response.get("usage", None):
        llm_response.usage_metadata = types.GenerateContentResponseUsageMetadata(...)

    return llm_response

The LiteLLM response contains response["choices"][0]["finish_reason"], but this is never extracted or set on the LlmResponse object.

Proposed Fix

Add 3 lines to extract and set finish_reason:

def _model_response_to_generate_content_response(response):
    message = None
    finish_reason = None  # ADD

    if response.get("choices", None):
        message = response["choices"][0].get("message", None)
        finish_reason = response["choices"][0].get("finish_reason", None)  # ADD

    if not message:
        raise ValueError("No message in response")

    llm_response = _message_to_generate_content_response(message)
    if finish_reason:  # ADD
        llm_response.finish_reason = finish_reason  # ADD

    if response.get("usage", None):
        llm_response.usage_metadata = types.GenerateContentResponseUsageMetadata(...)

    return llm_response

A complete patch file is available here: litellm_finish_reason.patch

Desktop (please complete the following information):

  • OS: macOS (also reproduced on Linux)
  • Python version: 3.12.0
  • ADK version: 1.11.0

Model Information:

  • Are you using LiteLLM: Yes
  • Which model is being used: gpt-3.5-turbo, gpt-4o (any LiteLLM-supported model)

Additional context

Impact: This bug prevents callbacks from:

  • Detecting when responses are truncated due to max_tokens limits
  • Implementing retry logic for incomplete responses
  • Logging completion statistics
  • Handling different completion conditions appropriately

Validation: A standalone reproduction script with automated validation is available at litellm_finish_reason_bug.py. This script demonstrates both the bug and validates the proposed fix.

Note on Tracing: After this fix is applied, google/adk/telemetry/tracing.py:222 will need updating to handle both enum (Gemini) and string (LiteLLM) finish_reason values:

if llm_response.finish_reason:
    if hasattr(llm_response.finish_reason, 'value'):
        finish_reason_str = llm_response.finish_reason.value.lower()
    else:
        finish_reason_str = str(llm_response.finish_reason)
    span.set_attribute('gen_ai.response.finish_reasons', [finish_reason_str])

Without this, tracing will raise AttributeError: 'str' object has no attribute 'value'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    models[Component] Issues related to model support

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions