-
Notifications
You must be signed in to change notification settings - Fork 6
Outline proposed architecture based on requirements #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/ARCHITECTURE.md
Outdated
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$ | ||
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$ | ||
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$ | ||
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$ | ||
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$ | ||
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation()
, no need for generateResult()
.
Then, if starting with GenerativeAiOperation
or GenerativeAiResult
, there could be some toText
, toImage
, or stream()
methods or so to transform the result into the desired shape.
Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts on this, I'm partially in agreement, partially not, partially not sure.
- The
generate*Result()
vsgenerate*Operation()
need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away. - The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.
- For the methods
generateText()
,generateImage()
etc., which technically would simply wrapgenerateTextResult()
,generateImageResult()
etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers:generateText()
orgenerateTextResult()->text()
? - Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in
generateResult()
andgenerateOperation()
, given the SDK is built with a multimodal first mindset. For example,generateTextResult()
is basically just forwarding togenerateResult()
with anoutputModalities: [ 'text' ]
config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so callinggenerateTextResult()
orgenerateText()
feels way more intuitive if you only want to generate text for example. - One exception is
streamGenerateTextResult()
which lives separately (not usinggenerateResult()
orgenerateOperation()
), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.
TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.
Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.
So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult()
which gives me the option to work with the object and I like generateText()
(and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text()
.
If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄
As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @JasonTheAdams, I personally agree with it completely.
That said, it's always easier to add additional API surfaces later than remove them (if they show to be undesirable), and the outcome of the high-level https://x.com/felixarntz/status/1936116496658579717 survey shows a preference to calling the nested method to "transform" the overall result object.
Based on this, for now I'm leaning towards following that hint and remove the additional declarative methods in favor of "transform" methods on the result objects. Again, this isn't set in stone and could be changed in the future, but I think it's reasonable to go with what the majority of the survey respondents favored as a starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just updated this in 42c1a83, that diff shows the changes clearly. I'm personally not in love with this as it feels more verbose, but let's start with that.
Follow-up question: The new utility methods are all named to...()
, e.g. toText()
. This is in line with what was in the Twitter survey, and it provides a separation between simple getters that return a direct child object of GenerativeAiResult
and these methods, which get more deeply nested pieces of data, including bulk-extracting from arrays in some cases (when there are multiple AI response candidates returned by the model). Now my question is: Does that separation help, or is it confusing?
I raise this because e.g. generateTextResult()->getCandidates()
but generateTextResult()->toTexts()
, or, if you go the very verbose route, generateTextResult()->getCandidates()->toTexts()
.
Should all methods be called get...()
, e.g. getText()
instead, or is the current naming good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good stuff here! I don't think I've fully wrapped my head around the entire API, but I've got a good idea. I left some notes. Also, this PR notes that it's imperative that this satisfies every must-have of the requirements doc, but it doesn't address the REST API. Perhaps that shouldn't be a must-have requirement based on our conversations surrounding that?
docs/ARCHITECTURE.md
Outdated
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$ | ||
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$ | ||
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$ | ||
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$ | ||
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$ | ||
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.
Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.
So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult()
which gives me the option to work with the object and I like generateText()
(and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text()
.
If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄
As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.
Maybe this is not clear from the current |
You're totally right. It's mentioned, but not in the requirements. Apologies. 😆 |
The architecture proposed here includes the concept of long-running operations, which leads to the distinction between "Operation" objects and "Result" objects for the various generative AI tasks. @JasonTheAdams raised this on Slack, please see this thread for additional context about this approach. |
…ex use-cases, with additional explanations.
Per my conversation @felixarntz, here are the examples reflected in a Fluent API method: Generate text using any suitable model from any provider (most basic example)$text = Ai::prompt('Write a 2-verse poem about PHP.')->generateText(); Generate text using a Google model$text = Ai::prompt('Write a 2-verse poem about PHP.')
->usingModel('gemini-2.5-flash')
->generateText(); Generate multiple text candidates using an Anthropic model$text = Ai::prompt('Write a 2-verse poem about PHP.')
->usingModel('claude-3.7-sonnet')
->generateTexts(4); Generate an image using any suitable OpenAI model$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
->usingProvider('openai')
->usingModelSupportingImages() // optional
->generateImageResult()
->toFile(); Generate an image using any suitable model from any provider$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
->usingModelSupportingImages() // optional
->generateImageResult()
->toFile(); Generate text using any suitable model from any provider$text = Ai::prompt('Write a 2-verse poem about PHP.')
->usingModelSupportingText()
->generateText(); Generate text with an image as additional input using any suitable model from any provider$text = Ai::prompt('Generate alternative text for this image.')
->withImage('image/png', $base64blob)
->generateText(); Generate text with chat history using any suitable model from any provider$text = Ai::prompt('Can you repeat that please?')
->withHistory(
new UserMessage('Do you spell it WordPress or Wordpress?'),
new AgentMessage('The correct spelling is WordPress.')
)
->generateText(); Generate text with JSON output using any suitable model from any provider$text = Ai::prompt('Transform the following CSV content into a JSON array of row data.')
->asJsonResponse
->withOutputSchema(['name' => 'string', 'age' => 'integer'])
->generateText(); Generate embeddings using any suitable model from any provider$embeddings = Ai::prompt('A very long text.', 'Another very long text.', 'More long text.')
->generateEmbeddings(); |
…anize overview diagrams for easier understanding.
…te not formally using the builder pattern.
…nterfaces to use `TextToSpeechConversion`.
Revise proposed architecture PR with fluent API class diagrams
Note to self: This PR so far talks about "API for consumption" vs "API for provider registration and implementation". Based on the target audiences clearly outlined in #11, we should probably update the wording here to align with that, e.g. "Implementer API" vs "Extender API". |
…el discovery via AiModelRequirements objects that can be automatically inferred from message and config objects.
…ration methods have a singular return type.
Add fluent API code examples to architecture documentation
@JasonTheAdams @borkweb FYI 639d627 is required as a workaround because Mermaid doesn't like namespaces that have the same names as a class. The actual namespace will of course be |
@JasonTheAdams @borkweb f7e8f69 adds the child classes for I'm quite happy with this as is now, so I'd say please give it another review to see whether you are too! I already spotted #21, and I like this direction quite a bit, but I'd say let's get this merged independently of the namespacing/directory structure and create a follow up PR to revise based on the discussion in #21. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me - excellent catch on the UserMessage
and ModelMessage
objects 🍸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
This proposed architecture is based on the current requirements (see #1).
For easier review, please see the Markdown version in the PR branch.
For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.