The LiteLLM wrapper does not propagate `finish_reason` from LiteLLM responses

**Describe the bug**

The LiteLLM wrapper does not propagate `finish_reason` from LiteLLM responses to `LlmResponse.finish_reason`. This makes it impossible for `after_model_callback` functions to detect when the max_tokens limit is hit or other completion conditions occur.

When using LiteLLM models, `llm_response.finish_reason` is always `None`, even when LiteLLM returns a valid finish_reason value (e.g., "length" for max_tokens truncation).

**To Reproduce**

Minimal reproduction code:

```python
import asyncio
from google.adk import Agent, Runner
from google.adk.agents.callback_context import CallbackContext
from google.adk.models.lite_llm import LiteLlm
from google.adk.models.llm_response import LlmResponse
from google.adk.sessions import InMemorySessionService
from google.genai import types


def create_inspector():
    """Callback to capture finish_reason."""
    captured = {"finish_reason": None}

    def inspector(ctx: CallbackContext, resp: LlmResponse) -> LlmResponse:
        captured["finish_reason"] = resp.finish_reason
        return resp

    inspector.captured = captured
    return inspector


async def test():
    # Create model with low max_tokens to trigger truncation
    model = LiteLlm(
        model="gpt-3.5-turbo",
        api_key="your-key",
        max_tokens=50,  # Intentionally low
    )

    inspector = create_inspector()

    agent = Agent(
        model=model,
        name="test",
        instruction="Provide detailed explanations.",
        after_model_callback=inspector,
    )

    session_service = InMemorySessionService()
    runner = Runner(
        app_name="test",
        agent=agent,
        session_service=session_service
    )

    await session_service.create_session(
        app_name="test",
        user_id="user",
        session_id="session",
        state={},
    )

    message = types.Content(
        role="user",
        parts=[types.Part(text="Explain quantum computing in detail.")]
    )

    async for _ in runner.run_async(
        user_id="user",
        session_id="session",
        new_message=message
    ):
        pass

    print(f"finish_reason: {inspector.captured['finish_reason']}")
    # Output: finish_reason: None (BUG - should be "length")


asyncio.run(test())
```

Steps to reproduce:
1. Install: `pip install google-adk litellm openai`
2. Run the code above with a valid API key
3. Observe that `finish_reason` is `None` even though LiteLLM returned `"length"`

**Expected behavior**

`llm_response.finish_reason` should contain the finish_reason value from LiteLLM:
- `"stop"` for natural completion
- `"length"` for max_tokens limit reached
- `"tool_calls"` for tool invocations
- `"content_filter"` for filtered content

This matches the behavior of native Gemini models, where `finish_reason` is properly populated.

**Root Cause**

In `google/adk/models/lite_llm.py`, the `_model_response_to_generate_content_response()` function (lines 473-499) extracts `usage_metadata` from the LiteLLM response but does not extract `finish_reason`:

```python
def _model_response_to_generate_content_response(response):
    message = None
    if response.get("choices", None):
        message = response["choices"][0].get("message", None)
        # Missing: finish_reason = response["choices"][0].get("finish_reason", None)

    llm_response = _message_to_generate_content_response(message)
    # Missing: llm_response.finish_reason = finish_reason

    if response.get("usage", None):
        llm_response.usage_metadata = types.GenerateContentResponseUsageMetadata(...)

    return llm_response
```

The LiteLLM response contains `response["choices"][0]["finish_reason"]`, but this is never extracted or set on the `LlmResponse` object.

**Proposed Fix**

Add 3 lines to extract and set `finish_reason`:

```python
def _model_response_to_generate_content_response(response):
    message = None
    finish_reason = None  # ADD

    if response.get("choices", None):
        message = response["choices"][0].get("message", None)
        finish_reason = response["choices"][0].get("finish_reason", None)  # ADD

    if not message:
        raise ValueError("No message in response")

    llm_response = _message_to_generate_content_response(message)
    if finish_reason:  # ADD
        llm_response.finish_reason = finish_reason  # ADD

    if response.get("usage", None):
        llm_response.usage_metadata = types.GenerateContentResponseUsageMetadata(...)

    return llm_response
```

A complete patch file is available here: [litellm_finish_reason.patch](https://github.com/example/litellm_finish_reason.patch)

**Desktop (please complete the following information):**
- OS: macOS (also reproduced on Linux)
- Python version: 3.12.0
- ADK version: 1.11.0

**Model Information:**
- Are you using LiteLLM: Yes
- Which model is being used: gpt-3.5-turbo, gpt-4o (any LiteLLM-supported model)

**Additional context**

**Impact:** This bug prevents callbacks from:
- Detecting when responses are truncated due to max_tokens limits
- Implementing retry logic for incomplete responses
- Logging completion statistics
- Handling different completion conditions appropriately

**Validation:** A standalone reproduction script with automated validation is available at [litellm_finish_reason_bug.py](https://github.com/example/litellm_finish_reason_bug.py). This script demonstrates both the bug and validates the proposed fix.

**Note on Tracing:** After this fix is applied, `google/adk/telemetry/tracing.py:222` will need updating to handle both enum (Gemini) and string (LiteLLM) finish_reason values:

```python
if llm_response.finish_reason:
    if hasattr(llm_response.finish_reason, 'value'):
        finish_reason_str = llm_response.finish_reason.value.lower()
    else:
        finish_reason_str = str(llm_response.finish_reason)
    span.set_attribute('gen_ai.response.finish_reasons', [finish_reason_str])
```

Without this, tracing will raise `AttributeError: 'str' object has no attribute 'value'`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The LiteLLM wrapper does not propagate `finish_reason` from LiteLLM responses #3109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The LiteLLM wrapper does not propagate finish_reason from LiteLLM responses #3109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The LiteLLM wrapper does not propagate `finish_reason` from LiteLLM responses #3109