Skip to content

Support prefill by ending history with ModelResponse #2778

@yf-yang

Description

@yf-yang

Initial Checks

Description

I am using response prefill technique (The last message is an assistant message, which constraints subsequent LLM generated assistant responses), but pydantic-ai fails to make actual LLM call and directly terminates.

https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-structured-data-extraction-with-prefilling

Example Code

import asyncio
import pprint

import dotenv
from pydantic_ai import Agent
from pydantic_ai.messages import (
  ModelMessage,
  ModelRequest,
  ModelResponse,
  TextPart,
  UserPromptPart,
)
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings

dotenv.load_dotenv()

# --- Agent and the Two-Tool Setup ---
agent = Agent(
  model=OpenAIResponsesModel("gpt-5-mini"),
  model_settings=OpenAIResponsesModelSettings(
    openai_service_tier="flex",
    max_tokens=8192,
    timeout=15,
  ),
)
agent = Agent("anthropic:claude-sonnet-4-0")


async def main():
  user_prompt = "Now generate a JSON object with key a and value b, wrapped in <json></json> tags. You must stop immediately after the </json> tag."
  print(f'👤 User Prompt: "{user_prompt}"')

  history: list[ModelMessage] = [
    ModelRequest(parts=[UserPromptPart(content=user_prompt)]),
    ModelResponse(parts=[TextPart(content="<json>{")]),
  ]

  # The agent will use this rich history to determine the next step
  final_result = await agent.run(message_history=history)

  print("-" * 20)
  print(f"🤖 Final Answer: {final_result.output}")
  print("-" * 20)
  print("history")
  pprint.pprint(final_result.all_messages())


if __name__ == "__main__":
  asyncio.run(main())

Python, Pydantic AI & LLM client version

python 3.12
pydantic-ai 0.8.1

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions