Skip to content

Conversation

ericstj
Copy link
Member

@ericstj ericstj commented Aug 26, 2025

Fixes #6639

This adds a prototype implementation of ImageGenerationTool with support in OpenAI provider.

To use this requires using the OpenAI Responses chat client with OpenAI provider (not Azure OpenAI).

Still required:

  • Iterate on behavior of middle-ware to better support edit and remove workarounds.
  • More tests(currently testing with a sample)
  • Polish and document API
  • Update to latest OpenAI and remove private reflection.
Microsoft Reviewers: Open in CodeFlow

@Copilot Copilot AI review requested due to automatic review settings August 26, 2025 15:41
@ericstj ericstj requested a review from a team as a code owner August 26, 2025 15:41
@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Aug 26, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a prototype implementation of ImageGenerationTool with support in the OpenAI provider, enabling AI services to perform image generation when specified as a tool.

  • Introduces ImageGenerationTool as a marker tool class that can be configured with ImageGenerationOptions
  • Adds experimental APIs for applying chat response updates to existing responses with configurable coalescing options
  • Implements reflection-based handling of OpenAI's internal image generation response types in the OpenAI provider

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ImageGenerationTool.cs New tool class for enabling image generation capabilities
ChatResponseUpdateCoalescingOptions.cs Configuration options for how response updates are merged
ChatResponseExtensions.cs Extension methods for applying updates to chat responses
OpenAIResponsesChatClient.cs OpenAI provider implementation with image generation support via reflection
OpenAIJsonContext.cs Added JSON serialization support for new types
ChatResponseUpdate.cs Fixed documentation reference
ChatResponseUpdateExtensionsTests.cs Comprehensive tests for new update coalescing functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


namespace Microsoft.Extensions.AI;

/// <summary>Represents a hosted tool that can be specified to an AI service to enable it to perform image generation.</summary>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's other hosted tools, like OpenAI's Code Interpreter. Other providers have something similar. Anthropic for example has Web Search, Fetch and Code Interpreter as "Server Tools". Maybe out of scope for this change, but a generalized abstraction for these would be great. AdditionalProperties seems to be common for all of them - including the Anthropic ones. So I think this would fit beautifully as a more general abstraction than ImageGenerationTool. ServerTool or HostedTool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have HostedWebSearchTool, HostedCodeInterpreterTool, HostedFileSearchTool, and HostedMcpServerTool. Having a HostedImageGenerationTool makes sense to me. Different providers have different ways of exposing the same fundamental information, so being able to write e.g. HostedWebSearchTool, and have that map to the right thing for Gemini and Anthropic and OpenAI makes sense to me. AdditionalProperties can be used in each when there's some setting that's not exposed in a strongly-typed fashion on the tool type.


namespace Microsoft.Extensions.AI;

/// <summary>Represents a hosted tool that can be specified to an AI service to enable it to perform image generation.</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have HostedWebSearchTool, HostedCodeInterpreterTool, HostedFileSearchTool, and HostedMcpServerTool. Having a HostedImageGenerationTool makes sense to me. Different providers have different ways of exposing the same fundamental information, so being able to write e.g. HostedWebSearchTool, and have that map to the right thing for Gemini and Anthropic and OpenAI makes sense to me. AdditionalProperties can be used in each when there's some setting that's not exposed in a strongly-typed fashion on the tool type.

@ericstj
Copy link
Member Author

ericstj commented Sep 16, 2025

Forgot to mention that https://github.com/ericstj/imageGeneratorSample/tree/imageGeneratorTool is the sample that uses this.

@ericstj ericstj marked this pull request as draft October 1, 2025 15:07
@ericstj
Copy link
Member Author

ericstj commented Oct 1, 2025

I'm moving this to draft. I have private changes will pick up a self-built copy of OpenAI that has these types public (change is in main of OpenAI, but not yet on NuGet), which we won't be able to merge. I'm also still not happy with the middle-ware and need to experiment more with that.

@ericstj
Copy link
Member Author

ericstj commented Oct 1, 2025

I'll also add that I know this middle-ware is not thread safe at the moment and it needs to be. I'm currently experimenting how to manage the state. Once I settle on that I'll make it thread-safe since we document that all IChatClient's need to be.

_ = Throw.IfNull(response);
_ = Throw.IfNull(updates);

if (updates is ICollection<ChatResponseUpdate> { Count: 0 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the existing ToChatResponse then just:

ChatResponse response = new();
response.ApplyUpdates(updates);
return response;

or is there a meaningful behavioral or performance difference with that?

case DataContent dataContent when
!string.IsNullOrEmpty(dataContent.Name):
// Check if there's an existing DataContent with the same name to replace
for (int i = 0; i < message.Contents.Count; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the algorithm to be O(N^2)?

Comment on lines +58 to +61
// the following fields all have scope per-request. They are cleared at the start of each request.
private readonly Dictionary<string, List<AIContent>> _imageContentByCallId = [];
private readonly Dictionary<string, AIContent> _imageContentById = new(StringComparer.OrdinalIgnoreCase);
private ImageGenerationOptions? _imageGenerationOptions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect instead you'd create an object in Get{Streaming}ResponseAsync to hold the state for that request, and whatever tools you want to send would reference that object (e.g. they'd be instance methods on it).

for (int contentIndex = 0; contentIndex < message.Contents.Count; contentIndex++)
{
var content = message.Contents[contentIndex];
if (content is DataContent dataContent && dataContent.MediaType.StartsWith("image/", StringComparison.OrdinalIgnoreCase))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataContent.HasTopLevelMediaType is intended for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ai Microsoft.Extensions.AI libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text to Image: Support image generation as an AITool
3 participants