Skip to content

Outline proposed architecture based on requirements #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Jul 14, 2025

Conversation

felixarntz
Copy link
Member

@felixarntz felixarntz commented Jun 18, 2025

This proposed architecture is based on the current requirements (see #1).

For easier review, please see the Markdown version in the PR branch.

For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.

Comment on lines 209 to 227
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation(), no need for generateResult().

Then, if starting with GenerativeAiOperation or GenerativeAiResult, there could be some toText, toImage, or stream() methods or so to transform the result into the desired shape.

Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on this, I'm partially in agreement, partially not, partially not sure.

  • The generate*Result() vs generate*Operation() need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away.
  • The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.
  • For the methods generateText(), generateImage() etc., which technically would simply wrap generateTextResult(), generateImageResult() etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers: generateText() or generateTextResult()->text()?
  • Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in generateResult() and generateOperation(), given the SDK is built with a multimodal first mindset. For example, generateTextResult() is basically just forwarding to generateResult() with an outputModalities: [ 'text' ] config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so calling generateTextResult() or generateText() feels way more intuitive if you only want to generate text for example.
  • One exception is streamGenerateTextResult() which lives separately (not using generateResult() or generateOperation()), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.

TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.

Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.

So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult() which gives me the option to work with the object and I like generateText() (and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text().

If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄

As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @JasonTheAdams, I personally agree with it completely.

That said, it's always easier to add additional API surfaces later than remove them (if they show to be undesirable), and the outcome of the high-level https://x.com/felixarntz/status/1936116496658579717 survey shows a preference to calling the nested method to "transform" the overall result object.

Based on this, for now I'm leaning towards following that hint and remove the additional declarative methods in favor of "transform" methods on the result objects. Again, this isn't set in stone and could be changed in the future, but I think it's reasonable to go with what the majority of the survey respondents favored as a starting point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just updated this in 42c1a83, that diff shows the changes clearly. I'm personally not in love with this as it feels more verbose, but let's start with that.

Follow-up question: The new utility methods are all named to...(), e.g. toText(). This is in line with what was in the Twitter survey, and it provides a separation between simple getters that return a direct child object of GenerativeAiResult and these methods, which get more deeply nested pieces of data, including bulk-extracting from arrays in some cases (when there are multiple AI response candidates returned by the model). Now my question is: Does that separation help, or is it confusing?

I raise this because e.g. generateTextResult()->getCandidates() but generateTextResult()->toTexts(), or, if you go the very verbose route, generateTextResult()->getCandidates()->toTexts().

Should all methods be called get...(), e.g. getText() instead, or is the current naming good?

@felixarntz felixarntz added [Type] Developer Documentation Documentation for developers and removed documentation labels Jun 18, 2025
@felixarntz felixarntz marked this pull request as ready for review June 20, 2025 17:13
Copy link
Member

@JasonTheAdams JasonTheAdams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of good stuff here! I don't think I've fully wrapped my head around the entire API, but I've got a good idea. I left some notes. Also, this PR notes that it's imperative that this satisfies every must-have of the requirements doc, but it doesn't address the REST API. Perhaps that shouldn't be a must-have requirement based on our conversations surrounding that?

Comment on lines 209 to 227
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.

Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.

So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult() which gives me the option to work with the object and I like generateText() (and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text().

If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄

As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.

@felixarntz felixarntz changed the base branch from add/requirements-definition to trunk June 30, 2025 19:07
@felixarntz
Copy link
Member Author

@JasonTheAdams

Also, this PR notes that it's imperative that this satisfies every must-have of the requirements doc, but it doesn't address the REST API. Perhaps that shouldn't be a must-have requirement based on our conversations surrounding that?

Maybe this is not clear from the current REQUIREMENTS.md, but I just double checked and didn't see the REST API as a must-have there. Can you point to what made you think it was? Probably something we can clarify in that doc.

@JasonTheAdams
Copy link
Member

Maybe this is not clear from the current REQUIREMENTS.md, but I just double checked and didn't see the REST API as a must-have there. Can you point to what made you think it was? Probably something we can clarify in that doc.

You're totally right. It's mentioned, but not in the requirements. Apologies. 😆

@felixarntz
Copy link
Member Author

The architecture proposed here includes the concept of long-running operations, which leads to the distinction between "Operation" objects and "Result" objects for the various generative AI tasks.

@JasonTheAdams raised this on Slack, please see this thread for additional context about this approach.

@JasonTheAdams
Copy link
Member

JasonTheAdams commented Jul 9, 2025

Per my conversation @felixarntz, here are the examples reflected in a Fluent API method:

Generate text using any suitable model from any provider (most basic example)

$text = Ai::prompt('Write a 2-verse poem about PHP.')->generateText();

Generate text using a Google model

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModel('gemini-2.5-flash')
  ->generateText();

Generate multiple text candidates using an Anthropic model

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModel('claude-3.7-sonnet')
  ->generateTexts(4);

Generate an image using any suitable OpenAI model

$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
  ->usingProvider('openai')
  ->usingModelSupportingImages() // optional
  ->generateImageResult()
  ->toFile();

Generate an image using any suitable model from any provider

$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
  ->usingModelSupportingImages() // optional
  ->generateImageResult()
  ->toFile();

Generate text using any suitable model from any provider

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModelSupportingText()
  ->generateText();

Generate text with an image as additional input using any suitable model from any provider

$text = Ai::prompt('Generate alternative text for this image.')
  ->withImage('image/png', $base64blob)
  ->generateText();

Generate text with chat history using any suitable model from any provider

$text = Ai::prompt('Can you repeat that please?')
  ->withHistory(
    new UserMessage('Do you spell it WordPress or Wordpress?'),
    new AgentMessage('The correct spelling is WordPress.')
  )
  ->generateText();

Generate text with JSON output using any suitable model from any provider

$text = Ai::prompt('Transform the following CSV content into a JSON array of row data.')
  ->asJsonResponse
  ->withOutputSchema(['name' => 'string', 'age' => 'integer'])
  ->generateText();

Generate embeddings using any suitable model from any provider

$embeddings = Ai::prompt('A very long text.', 'Another very long text.', 'More long text.')
  ->generateEmbeddings();

@felixarntz
Copy link
Member Author

Note to self: This PR so far talks about "API for consumption" vs "API for provider registration and implementation".

Based on the target audiences clearly outlined in #11, we should probably update the wording here to align with that, e.g. "Implementer API" vs "Extender API".

cc @JasonTheAdams

@felixarntz felixarntz mentioned this pull request Jul 13, 2025
16 tasks
@felixarntz
Copy link
Member Author

@JasonTheAdams @borkweb FYI 639d627 is required as a workaround because Mermaid doesn't like namespaces that have the same names as a class.

The actual namespace will of course be AiClient, as already defined in composer.json.

@felixarntz
Copy link
Member Author

@JasonTheAdams @borkweb f7e8f69 adds the child classes for Message that we had previously discussed to have for ease of use, plus they're already referenced in the examples (also see related fix 3ff376e).

I'm quite happy with this as is now, so I'd say please give it another review to see whether you are too!

I already spotted #21, and I like this direction quite a bit, but I'd say let's get this merged independently of the namespacing/directory structure and create a follow up PR to revise based on the discussion in #21.

Copy link
Contributor

@borkweb borkweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me - excellent catch on the UserMessage and ModelMessage objects 🍸

Copy link
Member

@JasonTheAdams JasonTheAdams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@felixarntz felixarntz merged commit 4c9e57b into trunk Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Developer Documentation Documentation for developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants