-
Notifications
You must be signed in to change notification settings - Fork 835
Image generation tool #6749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Image generation tool #6749
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a prototype implementation of ImageGenerationTool with support in the OpenAI provider, enabling AI services to perform image generation when specified as a tool.
- Introduces
ImageGenerationTool
as a marker tool class that can be configured withImageGenerationOptions
- Adds experimental APIs for applying chat response updates to existing responses with configurable coalescing options
- Implements reflection-based handling of OpenAI's internal image generation response types in the OpenAI provider
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
ImageGenerationTool.cs | New tool class for enabling image generation capabilities |
ChatResponseUpdateCoalescingOptions.cs | Configuration options for how response updates are merged |
ChatResponseExtensions.cs | Extension methods for applying updates to chat responses |
OpenAIResponsesChatClient.cs | OpenAI provider implementation with image generation support via reflection |
OpenAIJsonContext.cs | Added JSON serialization support for new types |
ChatResponseUpdate.cs | Fixed documentation reference |
ChatResponseUpdateExtensionsTests.cs | Comprehensive tests for new update coalescing functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIResponsesChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIResponsesChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponseExtensions.cs
Outdated
Show resolved
Hide resolved
|
||
namespace Microsoft.Extensions.AI; | ||
|
||
/// <summary>Represents a hosted tool that can be specified to an AI service to enable it to perform image generation.</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's other hosted tools, like OpenAI's Code Interpreter. Other providers have something similar. Anthropic for example has Web Search, Fetch and Code Interpreter as "Server Tools". Maybe out of scope for this change, but a generalized abstraction for these would be great. AdditionalProperties seems to be common for all of them - including the Anthropic ones. So I think this would fit beautifully as a more general abstraction than ImageGenerationTool
. ServerTool
or HostedTool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have HostedWebSearchTool, HostedCodeInterpreterTool, HostedFileSearchTool, and HostedMcpServerTool. Having a HostedImageGenerationTool makes sense to me. Different providers have different ways of exposing the same fundamental information, so being able to write e.g. HostedWebSearchTool, and have that map to the right thing for Gemini and Anthropic and OpenAI makes sense to me. AdditionalProperties can be used in each when there's some setting that's not exposed in a strongly-typed fashion on the tool type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/ImageGenerationTool.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/ImageGenerationTool.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponseExtensions.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponseExtensions.cs
Outdated
Show resolved
Hide resolved
|
||
namespace Microsoft.Extensions.AI; | ||
|
||
/// <summary>Represents a hosted tool that can be specified to an AI service to enable it to perform image generation.</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have HostedWebSearchTool, HostedCodeInterpreterTool, HostedFileSearchTool, and HostedMcpServerTool. Having a HostedImageGenerationTool makes sense to me. Different providers have different ways of exposing the same fundamental information, so being able to write e.g. HostedWebSearchTool, and have that map to the right thing for Gemini and Anthropic and OpenAI makes sense to me. AdditionalProperties can be used in each when there's some setting that's not exposed in a strongly-typed fashion on the tool type.
Forgot to mention that https://github.com/ericstj/imageGeneratorSample/tree/imageGeneratorTool is the sample that uses this. |
…eGenerationTool
…eGenerationTool
I'm moving this to draft. I have private changes will pick up a self-built copy of OpenAI that has these types public (change is in main of OpenAI, but not yet on NuGet), which we won't be able to merge. I'm also still not happy with the middle-ware and need to experiment more with that. |
I'll also add that I know this middle-ware is not thread safe at the moment and it needs to be. I'm currently experimenting how to manage the state. Once I settle on that I'll make it thread-safe since we document that all IChatClient's need to be. |
_ = Throw.IfNull(response); | ||
_ = Throw.IfNull(updates); | ||
|
||
if (updates is ICollection<ChatResponseUpdate> { Count: 0 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the existing ToChatResponse then just:
ChatResponse response = new();
response.ApplyUpdates(updates);
return response;
or is there a meaningful behavioral or performance difference with that?
case DataContent dataContent when | ||
!string.IsNullOrEmpty(dataContent.Name): | ||
// Check if there's an existing DataContent with the same name to replace | ||
for (int i = 0; i < message.Contents.Count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the algorithm to be O(N^2)?
// the following fields all have scope per-request. They are cleared at the start of each request. | ||
private readonly Dictionary<string, List<AIContent>> _imageContentByCallId = []; | ||
private readonly Dictionary<string, AIContent> _imageContentById = new(StringComparer.OrdinalIgnoreCase); | ||
private ImageGenerationOptions? _imageGenerationOptions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect instead you'd create an object in Get{Streaming}ResponseAsync to hold the state for that request, and whatever tools you want to send would reference that object (e.g. they'd be instance methods on it).
for (int contentIndex = 0; contentIndex < message.Contents.Count; contentIndex++) | ||
{ | ||
var content = message.Contents[contentIndex]; | ||
if (content is DataContent dataContent && dataContent.MediaType.StartsWith("image/", StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataContent.HasTopLevelMediaType is intended for this
Fixes #6639
This adds a prototype implementation of ImageGenerationTool with support in OpenAI provider.
To use this requires using the OpenAI Responses chat client with OpenAI provider (not Azure OpenAI).
Still required:
Microsoft Reviewers: Open in CodeFlow