Outline proposed architecture based on requirements #2

felixarntz · 2025-06-18T03:30:09Z

This proposed architecture is based on the current requirements (see #1).

For easier review, please see the Markdown version in the PR branch.

For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.

swissspidy · 2025-06-18T12:12:58Z

docs/ARCHITECTURE.md

+            +generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+            +streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+            +generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+            +generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+            +generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+            +generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$


Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation(), no need for generateResult().

Then, if starting with GenerativeAiOperation or GenerativeAiResult, there could be some toText, toImage, or stream() methods or so to transform the result into the desired shape.

Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)

A few thoughts on this, I'm partially in agreement, partially not, partially not sure.

The generate*Result() vs generate*Operation() need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away.

The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.

For the methods generateText(), generateImage() etc., which technically would simply wrap generateTextResult(), generateImageResult() etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers: generateText() or generateTextResult()->text()?

Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in generateResult() and generateOperation(), given the SDK is built with a multimodal first mindset. For example, generateTextResult() is basically just forwarding to generateResult() with an outputModalities: [ 'text' ] config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so calling generateTextResult() or generateText() feels way more intuitive if you only want to generate text for example.

One exception is streamGenerateTextResult() which lives separately (not using generateResult() or generateOperation()), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.

TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.

Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.

Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.

Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.

So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult() which gives me the option to work with the object and I like generateText() (and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text().

If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄

As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.

Thanks for the feedback @JasonTheAdams, I personally agree with it completely.

That said, it's always easier to add additional API surfaces later than remove them (if they show to be undesirable), and the outcome of the high-level https://x.com/felixarntz/status/1936116496658579717 survey shows a preference to calling the nested method to "transform" the overall result object.

Based on this, for now I'm leaning towards following that hint and remove the additional declarative methods in favor of "transform" methods on the result objects. Again, this isn't set in stone and could be changed in the future, but I think it's reasonable to go with what the majority of the survey respondents favored as a starting point.

I've just updated this in 42c1a83, that diff shows the changes clearly. I'm personally not in love with this as it feels more verbose, but let's start with that.

Follow-up question: The new utility methods are all named to...(), e.g. toText(). This is in line with what was in the Twitter survey, and it provides a separation between simple getters that return a direct child object of GenerativeAiResult and these methods, which get more deeply nested pieces of data, including bulk-extracting from arrays in some cases (when there are multiple AI response candidates returned by the model). Now my question is: Does that separation help, or is it confusing?

I raise this because e.g. generateTextResult()->getCandidates() but generateTextResult()->toTexts(), or, if you go the very verbose route, generateTextResult()->getCandidates()->toTexts().

Should all methods be called get...(), e.g. getText() instead, or is the current naming good?

JasonTheAdams

A lot of good stuff here! I don't think I've fully wrapped my head around the entire API, but I've got a good idea. I left some notes. Also, this PR notes that it's imperative that this satisfies every must-have of the requirements doc, but it doesn't address the REST API. Perhaps that shouldn't be a must-have requirement based on our conversations surrounding that?

docs/ARCHITECTURE.md

JasonTheAdams · 2025-06-27T23:36:33Z

docs/ARCHITECTURE.md

+            +generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+            +streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+            +generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+            +generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+            +generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+            +generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$


Weighing in here, I'll focus on @swissspidy's comment with regards to reducing the public API surface.

Personally, I prefer it when an API has a combination of robust and declarative methods. There's often a base public method with more imperative parameters which is used by a series of declarative methods. When possible, in my code, I'll prefer the declarative methods for readability and simplicity; when necessary, I'll use the base methods to construct my own declarative methods.

So I like the idea of having the declarative methods that can be used to simplify the implementing code for patterns that we know will be highly used. I like generateTextResult() which gives me the option to work with the object and I like generateText() (and would likely use more often), which reduces the amount of times I'd need to do generateTextResult()->text().

If a class has too many methods doing too many different things, then that's violating single responsibility. But if a class has many declarative methods for the same thing, that's useful, in my friendly opinion. 😄

As a note, when writing documentation, you don't have to detail out every declarative method, or vice versa the base method — whichever is the more intended API.

docs/ARCHITECTURE.md

felixarntz · 2025-07-01T21:53:20Z

@JasonTheAdams

Also, this PR notes that it's imperative that this satisfies every must-have of the requirements doc, but it doesn't address the REST API. Perhaps that shouldn't be a must-have requirement based on our conversations surrounding that?

Maybe this is not clear from the current REQUIREMENTS.md, but I just double checked and didn't see the REST API as a must-have there. Can you point to what made you think it was? Probably something we can clarify in that doc.

…n entry point.

JasonTheAdams · 2025-07-01T23:03:13Z

Maybe this is not clear from the current REQUIREMENTS.md, but I just double checked and didn't see the REST API as a must-have there. Can you point to what made you think it was? Probably something we can clarify in that doc.

You're totally right. It's mentioned, but not in the requirements. Apologies. 😆

felixarntz · 2025-07-07T23:02:10Z

The architecture proposed here includes the concept of long-running operations, which leads to the distinction between "Operation" objects and "Result" objects for the various generative AI tasks.

@JasonTheAdams raised this on Slack, please see this thread for additional context about this approach.

…ex use-cases, with additional explanations.

JasonTheAdams · 2025-07-09T21:11:10Z

Per my conversation @felixarntz, here are the examples reflected in a Fluent API method:

Generate text using any suitable model from any provider (most basic example)

$text = Ai::prompt('Write a 2-verse poem about PHP.')->generateText();

Generate text using a Google model

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModel('gemini-2.5-flash')
  ->generateText();

Generate multiple text candidates using an Anthropic model

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModel('claude-3.7-sonnet')
  ->generateTexts(4);

Generate an image using any suitable OpenAI model

$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
  ->usingProvider('openai')
  ->usingModelSupportingImages() // optional
  ->generateImageResult()
  ->toFile();

Generate an image using any suitable model from any provider

$text = Ai::prompt('Generate an illustration of the PHP elephant in the Carribean sea.')
  ->usingModelSupportingImages() // optional
  ->generateImageResult()
  ->toFile();

Generate text using any suitable model from any provider

$text = Ai::prompt('Write a 2-verse poem about PHP.')
  ->usingModelSupportingText()
  ->generateText();

Generate text with an image as additional input using any suitable model from any provider

$text = Ai::prompt('Generate alternative text for this image.')
  ->withImage('image/png', $base64blob)
  ->generateText();

Generate text with chat history using any suitable model from any provider

$text = Ai::prompt('Can you repeat that please?')
  ->withHistory(
    new UserMessage('Do you spell it WordPress or Wordpress?'),
    new AgentMessage('The correct spelling is WordPress.')
  )
  ->generateText();

Generate text with JSON output using any suitable model from any provider

$text = Ai::prompt('Transform the following CSV content into a JSON array of row data.')
  ->asJsonResponse
  ->withOutputSchema(['name' => 'string', 'age' => 'integer'])
  ->generateText();

Generate embeddings using any suitable model from any provider

$embeddings = Ai::prompt('A very long text.', 'Another very long text.', 'More long text.')
  ->generateEmbeddings();

…anize overview diagrams for easier understanding.

…te not formally using the builder pattern.

…nterfaces to use `TextToSpeechConversion`.

Revise proposed architecture PR with fluent API class diagrams

felixarntz · 2025-07-11T18:34:10Z

Note to self: This PR so far talks about "API for consumption" vs "API for provider registration and implementation".

Based on the target audiences clearly outlined in #11, we should probably update the wording here to align with that, e.g. "Implementer API" vs "Extender API".

cc @JasonTheAdams

…el discovery via AiModelRequirements objects that can be automatically inferred from message and config objects.

…ration methods have a singular return type.

…line images

Add fluent API code examples to architecture documentation

felixarntz · 2025-07-14T17:48:26Z

@JasonTheAdams @borkweb FYI 639d627 is required as a workaround because Mermaid doesn't like namespaces that have the same names as a class.

The actual namespace will of course be AiClient, as already defined in composer.json.

felixarntz · 2025-07-14T17:54:17Z

@JasonTheAdams @borkweb f7e8f69 adds the child classes for Message that we had previously discussed to have for ease of use, plus they're already referenced in the examples (also see related fix 3ff376e).

I'm quite happy with this as is now, so I'd say please give it another review to see whether you are too!

I already spotted #21, and I like this direction quite a bit, but I'd say let's get this merged independently of the namespacing/directory structure and create a follow up PR to revise based on the discussion in #21.

borkweb

Looks great to me - excellent catch on the UserMessage and ModelMessage objects 🍸

JasonTheAdams

LGTM! 🎉

Outline proposed architecture based on requirements.

aac973e

felixarntz added the documentation label Jun 18, 2025

swissspidy reviewed Jun 18, 2025

View reviewed changes

felixarntz added [Type] Developer Documentation Documentation for developers and removed documentation labels Jun 18, 2025

felixarntz marked this pull request as ready for review June 20, 2025 17:13

JasonTheAdams reviewed Jun 28, 2025

View reviewed changes

felixarntz changed the base branch from add/requirements-definition to trunk June 30, 2025 19:07

felixarntz added 2 commits July 1, 2025 14:53

Merge branch 'trunk' into add/architecture-definition

d8f932b

Use utility methods on result object instead of declarative methods o…

42c1a83

…n entry point.

Include additional code examples for some simpler and some more compl…

ad14f14

…ex use-cases, with additional explanations.

felixarntz requested review from JasonTheAdams and swissspidy July 9, 2025 18:37

Fix incorrect class references.

9b71895

swissspidy approved these changes Jul 9, 2025

View reviewed changes

felixarntz added 3 commits July 9, 2025 20:52

Merge branch 'trunk' into add/architecture-definition

3d8e542

Revise class diagrams to include fluent API infrastructure, and reorg…

b13e070

…anize overview diagrams for easier understanding.

Rename Facade classes to Builder classes, as more suitable name despi…

24285c4

…te not formally using the builder pattern.

felixarntz mentioned this pull request Jul 10, 2025

Revise proposed architecture PR with fluent API class diagrams #10

Merged

felixarntz added 3 commits July 10, 2025 17:14

Use variadic parameter for messages.

920db77

Rename textToSpeech methods to convertTextToSpeech and relevant i…

8fa8b21

…nterfaces to use `TextToSpeechConversion`.

Merge pull request #10 from WordPress/architecture-revision/fluent-api

fe53935

Revise proposed architecture PR with fluent API class diagrams

felixarntz added 3 commits July 11, 2025 15:35

Rename getUsage() to getTokenUsage().

3346bdd

Rename features/capabilities to capabilities/options and simplify mod…

eef847e

…el discovery via AiModelRequirements objects that can be automatically inferred from message and config objects.

Consistently use AI implementers and extenders term.

4b9a6c8

borkweb and others added 9 commits July 12, 2025 18:34

Use the full schema declaration for JSON

1163c9f

Remove asArrayResponse()

523e811

Remove invalid generate methods and ensure remaining non Result & Ope…

871fc06

…ration methods have a singular return type.

Use withImageFile() along with new InlineFile() for specifying in…

1d0c10f

…line images

Make the chat history example read better top-to-bottom

b86d6d8

Fix order of variadic dots

f5873f7

Merge branch 'trunk' into add/architecture-definition

7c4e420

Rename namespace for consistency.

6df92b7

Fix TEXT_TO_SPEECH enum value for consistency.

b72d636

felixarntz mentioned this pull request Jul 13, 2025

Initial MVP task breakdown #20

Open

16 tasks

felixarntz and others added 5 commits July 13, 2025 16:07

Align root namespace with project decision.

2d981a6

Adding image convenience methods

985cd42

Use a different prompt structure for chat history to show flexibility

fb446e4

Return multiples when there are multiple candidates

4571237

Set prompt() arg to string|Message. Remove variadic from message()

7e0cbfd

borkweb mentioned this pull request Jul 14, 2025

Proposal: Use a Domain-Specific directory structure #21

Closed

borkweb and others added 6 commits July 14, 2025 12:57

Keep message() arg optional

8a1a2b8

Make $candidateCount optional for plural generate termination methods

b234aa1

Remove embedding examples - too soon to know what we want

c046ece

Make prompt() non-variadic

1187c10

Merge pull request #12 from borkweb/add/fluent-code-examples

6a29ffc

Add fluent API code examples to architecture documentation

Fix AiClient namespace just to cater for Mermaid problem.

639d627

felixarntz added 2 commits July 14, 2025 10:50

Added child message classes in class diagrams that were missing before.

f7e8f69

Fix AgentMessage name.

3ff376e

Merge branch 'trunk' into add/architecture-definition

77310b8

borkweb approved these changes Jul 14, 2025

View reviewed changes

JasonTheAdams approved these changes Jul 14, 2025

View reviewed changes

felixarntz merged commit 4c9e57b into trunk Jul 14, 2025

Outline proposed architecture based on requirements #2

Outline proposed architecture based on requirements #2

Uh oh!

Conversation

felixarntz commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swissspidy Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

JasonTheAdams Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

JasonTheAdams left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JasonTheAdams Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

felixarntz commented Jul 1, 2025

Uh oh!

JasonTheAdams commented Jul 1, 2025

Uh oh!

felixarntz commented Jul 7, 2025

Uh oh!

JasonTheAdams commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generate text using any suitable model from any provider (most basic example)

Generate text using a Google model

Generate multiple text candidates using an Anthropic model

Generate an image using any suitable OpenAI model

Generate an image using any suitable model from any provider

Generate text using any suitable model from any provider

Generate text with an image as additional input using any suitable model from any provider

Generate text with chat history using any suitable model from any provider

Generate text with JSON output using any suitable model from any provider

Generate embeddings using any suitable model from any provider

Uh oh!

felixarntz commented Jul 11, 2025

Uh oh!

felixarntz commented Jul 14, 2025

Uh oh!

felixarntz commented Jul 14, 2025

Uh oh!

borkweb left a comment

Choose a reason for hiding this comment

Uh oh!

JasonTheAdams left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felixarntz commented Jun 18, 2025 •

edited

Loading

JasonTheAdams commented Jul 9, 2025 •

edited

Loading

JasonTheAdams left a comment •

edited

Loading