Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions docs/models/openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,56 @@ As of 7:48 AM on Wednesday, April 2, 2025, in Tokyo, Japan, the weather is cloud

You can learn more about the differences between the Responses API and Chat Completions API in the [OpenAI API docs](https://platform.openai.com/docs/guides/responses-vs-chat-completions).

#### Referencing earlier responses

The Responses API supports referencing earlier model responses in a new request using a `previous_response_id` parameter, to ensure the full [conversation state](https://platform.openai.com/docs/guides/conversation-state?api-mode=responses#passing-context-from-the-previous-response) including [reasoning items](https://platform.openai.com/docs/guides/reasoning#keeping-reasoning-items-in-context) are kept in context. This is available through the `openai_previous_response_id` field in
[`OpenAIResponsesModelSettings`][pydantic_ai.models.openai.OpenAIResponsesModelSettings].

```python
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings

model = OpenAIResponsesModel('gpt-5')
agent = Agent(model=model)

result = agent.run_sync('The secret is 1234')
model_settings = OpenAIResponsesModelSettings(
openai_previous_response_id=result.all_messages()[-1].provider_response_id
)
result = agent.run_sync('What is the secret code?', model_settings=model_settings)
print(result.output)
#> 1234
```

By passing the `provider_response_id` from an earlier run, you can allow the model to build on its own prior reasoning without needing to resend the full message history.

##### Automatically referencing earlier responses

When the `openai_previous_response_id` field is set to `'auto'`, Pydantic AI will automatically select the most recent `provider_response_id` from message history and omit messages that came before it, letting the OpenAI API leverage server-side history instead for improved efficiency.

```python
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings

model = OpenAIResponsesModel('gpt-5')
agent = Agent(model=model)

result1 = agent.run_sync('Tell me a joke.')
print(result1.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.

# When set to 'auto', the most recent provider_response_id
# and messages after it are sent as request.
model_settings = OpenAIResponsesModelSettings(openai_previous_response_id='auto')
result2 = agent.run_sync(
'Explain?',
message_history=result1.new_messages(),
model_settings=model_settings
)
print(result2.output)
#> This is an excellent joke invented by Samuel Colvin, it needs no explanation.
```

## OpenAI-compatible Models

Many providers and models are compatible with the OpenAI API, and can be used with `OpenAIChatModel` in Pydantic AI.
Expand Down
38 changes: 38 additions & 0 deletions pydantic_ai_slim/pydantic_ai/models/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,17 @@ class OpenAIResponsesModelSettings(OpenAIChatModelSettings, total=False):
`medium`, and `high`.
"""

openai_previous_response_id: Literal['auto'] | str
"""The ID of a previous response from the model to use as the starting point for a continued conversation.

When set to `'auto'`, the request automatically uses the most recent
`provider_response_id` from the message history and omits earlier messages.

This enables the model to use server-side conversation state and faithfully reference previous reasoning.
See the [OpenAI Responses API documentation](https://platform.openai.com/docs/guides/reasoning#keeping-reasoning-items-in-context)
for more information.
"""


@dataclass(init=False)
class OpenAIChatModel(Model):
Expand Down Expand Up @@ -977,6 +988,10 @@ async def _responses_create(
else:
tool_choice = 'auto'

previous_response_id = model_settings.get('openai_previous_response_id')
if previous_response_id == 'auto':
previous_response_id, messages = self._get_previous_response_id_and_new_messages(messages)

instructions, openai_messages = await self._map_messages(messages, model_settings)
reasoning = self._get_reasoning(model_settings)

Expand Down Expand Up @@ -1027,6 +1042,7 @@ async def _responses_create(
truncation=model_settings.get('openai_truncation', NOT_GIVEN),
timeout=model_settings.get('timeout', NOT_GIVEN),
service_tier=model_settings.get('openai_service_tier', NOT_GIVEN),
previous_response_id=previous_response_id,
reasoning=reasoning,
user=model_settings.get('openai_user', NOT_GIVEN),
text=text or NOT_GIVEN,
Expand Down Expand Up @@ -1092,6 +1108,28 @@ def _map_tool_definition(self, f: ToolDefinition) -> responses.FunctionToolParam
),
}

def _get_previous_response_id_and_new_messages(
self, messages: list[ModelMessage]
) -> tuple[str | None, list[ModelMessage]]:
# When `openai_previous_response_id` is set to 'auto', the most recent
# `provider_response_id` from the message history is selected and all
# earlier messages are omitted. This allows the OpenAI SDK to reuse
# server-side history for efficiency. The returned tuple contains the
# `previous_response_id` (if found) and the trimmed list of messages.
previous_response_id = None
trimmed_messages: list[ModelMessage] = []
for m in reversed(messages):
if isinstance(m, ModelResponse) and m.provider_name == self.system:
previous_response_id = m.provider_response_id
break
else:
trimmed_messages.append(m)

if previous_response_id and trimmed_messages:
return previous_response_id, list(reversed(trimmed_messages))
else:
return None, messages

async def _map_messages( # noqa: C901
self, messages: list[ModelMessage], model_settings: OpenAIResponsesModelSettings
) -> tuple[str | NotGiven, list[responses.ResponseInputItemParam]]:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
interactions:
- request:
headers:
accept:
- application/json
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-type:
- application/json
host:
- api.openai.com
method: POST
parsed_body:
input:
- content: The secret key is sesame
role: user
instructions: ''
model: gpt-5
text:
format:
type: text
uri: https://api.openai.com/v1/responses
response:
headers:
content-type:
- application/json
parsed_body:
created_at: 1743075629
error: null
id: resp_1234
incomplete_details: null
instructions: ''
max_output_tokens: null
metadata: {}
model: gpt-5
object: response
output:
- content:
- annotations: []
text: "Open sesame! What would you like to unlock?"
type: output_text
id: msg_test_previous_response_id
role: assistant
status: completed
type: message
parallel_tool_calls: true
previous_response_id: null
reasoning: null
status: complete
status_details: null
tool_calls: null
total_tokens: 15
usage:
input_tokens: 10
input_tokens_details:
cached_tokens: 0
output_tokens: 1
output_tokens_details:
reasoning_tokens: 0
total_tokens: 11
status:
code: 200
message: OK
- request:
headers:
accept:
- application/json
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-type:
- application/json
host:
- api.openai.com
method: POST
parsed_body:
input:
- content: What is the secret key again?
role: user
instructions: ''
model: gpt-5
text:
format:
type: text
previous_response_id: resp_1234
uri: https://api.openai.com/v1/responses
response:
headers:
content-type:
- application/json
parsed_body:
created_at: 1743075630
error: null
id: resp_5678
incomplete_details: null
instructions: ''
max_output_tokens: null
metadata: {}
model: gpt-5
object: response
output:
- content:
- annotations: []
text: "sesame"
type: output_text
id: msg_test_previous_response_id
role: assistant
status: completed
type: message
parallel_tool_calls: true
previous_response_id: resp_1234
reasoning: null
status: complete
status_details: null
tool_calls: null
total_tokens: 15
usage:
input_tokens: 10
input_tokens_details:
cached_tokens: 0
output_tokens: 1
output_tokens_details:
reasoning_tokens: 0
total_tokens: 11
status:
code: 200
message: OK
version: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
interactions:
- request:
headers:
accept:
- application/json
accept-encoding:
- gzip, deflate
connection:
- keep-alive
content-type:
- application/json
host:
- api.openai.com
method: POST
parsed_body:
input:
- content: What is the first secret key?
role: user
instructions: ''
model: gpt-5
text:
format:
type: text
previous_response_id: resp_68b9bda81f5c8197a5a51a20a9f4150a000497db2a4c777b
uri: https://api.openai.com/v1/responses
response:
headers:
content-type:
- application/json
parsed_body:
created_at: 1743075630
error: null
id: resp_a4168b9bda81f5c8197a5a51a20a9f4150a000497db2a4c5
incomplete_details: null
instructions: ''
max_output_tokens: null
metadata: {}
model: gpt-5
object: response
output:
- content:
- annotations: []
text: "sesame"
type: output_text
id: msg_test_previous_response_id_auto
role: assistant
status: completed
type: message
parallel_tool_calls: true
previous_response_id: resp_68b9bda81f5c8197a5a51a20a9f4150a000497db2a4c777b
reasoning: null
status: complete
status_details: null
tool_calls: null
total_tokens: 15
usage:
input_tokens: 10
input_tokens_details:
cached_tokens: 0
output_tokens: 1
output_tokens_details:
reasoning_tokens: 0
total_tokens: 11
status:
code: 200
message: OK
version: 1
Loading