-
Notifications
You must be signed in to change notification settings - Fork 11
feat(input_schema): Enable sub-schemas in input-schema #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @MFori, good job, this proposal makes sense. I like we're only expanding input schema to support more of the JSON schema features, without creating more custom extensions. This is great for calling Actors via MCP or OpenAPI. Few notes/questions:
|
Yes, we have a use-case from store team as they want to have a list of filters in Actor input, which would look like this. It would suit well for array field with object typed items with sub-schema with three fields (string enum, string enum, string) I think it wouldn't be that bad because under the sub-schema with
Make sense. Question is whether to solve this on the schema level, which would make the schema definition quite complex, or "manually" as an additional validation.
Yeah, just the limited set of properties. The enhanced one is supported only on the root-level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, I'm not 100% sure about the schemaBased
editor, maybe subfields
as editor name would work better?
Also I think we should lock it to only allow one level of recursivity, otherwise the UI might be pretty painful if someone missuses it.
Thanks for the clarification. I agree with @gippy - let's keep it simple and limited at start, we can always extend it in the future (but vice versa not easily). We should pick an appropriate editor name for that,
I think we can have this as a hard-coded validation in Apify CLI and on platform, if there's no easy way to squeeze it to our meta-schema.
Cool, please let's just mention that in the docs. |
And one more note: now that we have this, I think we should deprecate |
I'm not sure about the implicit So I would vote for choosing some name for this new I can think of:
|
Fair enough, "subschema" sounds best to me, and if not used, we render JSON editor |
@gippy, @MFori, please add me to these proposals next time. I see you reached some conclusion, but I want to raise one thing: People who are using our UI are not our core persona. People who are integrating us are so a good question is if this type of input will work for those people?
Building a UI internally for this is easy, but that is not how people use Apify, and we need to make sure that all the features here will work in all 3 (and more) cases above. |
My thought was that if currently some Actor accepts
I think documenting the shape of input field in Actor's page should be quite straightforward.
Again the integration would still fail if Actor expect some shape, but with sub-schema it would fail even before execution not inside the Actor. It would be nice if we could render proper UI here too, but providing the json input should be IMHO still fine. But adding @drobnikj - do you think it would be possible to render sub-schema fields in integration platforms?
I was thinking, in theory, this should add more context for LLMs, so they should be able to fill in even better. But adding @MQ37 - do you think my assumption is correct and can we prove it somehow? |
@MFori that would be actually great if Actor input schema would adhere JSON schema spec. Currently in the MCP server we infer the "sub-schemas" (array items and object properties) based on either the |
I agree with the above - we already accept Just last small change - please let's call the editor |
# Conflicts: # packages/input_schema/src/schema.json # test/utilities.client.test.ts
// remove leading and trailing slashes and replace remaining slashes with dots | ||
const cleanPropertyName = (name: string) => { | ||
return name.replace(/^\/|\/$/g, '').replace(/\//g, '.'); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed, because now error can be related to sub-properties and we want to show nice path.
const matchingDefinitions = Object | ||
.values<any>(definitions) // cast as any, as the code in first branch seems to be invalid | ||
.filter((definition) => { | ||
// Because the definitions contains not only the root properties definitions, but also sub-schema definitions | ||
// and utility definitions, we need to filter them out and validate only against the appropriate ones. | ||
// We do this by checking prefix of the definition title (Utils: or Sub-schema:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found better approach then grouping the definitions by its title, it's not that robust but works well.
Another approach would be to define definitions
, sub_definitions
and util_definitions
but that's not valid JSON Schema keywords and wouldn't pass Ajv
strict validation.
Or put it as sub objects like definitions.root
, definitions.sub
, definitions.utils
, but haven't test it.
But since this is used just here I think it's ok to go with this solution.
@@ -168,6 +227,10 @@ export function validateExistenceOfRequiredFields(inputSchema: InputSchema) { | |||
* then checks that all required fields are present and finally checks fully against the whole schema. | |||
* | |||
* This way we get the most accurate error message for user. | |||
* | |||
* @param validator An instance of AJV validator. Important: The JSON Schema that the passed input schema is validated against | |||
* is using features from JSON Schema 2019 draft, so the AJV instance must support it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR already more than doubled it's length, but without utilising the unevaluatedProperties
(https://json-schema.org/understanding-json-schema/reference/object#unevaluatedproperties) from draft 2019-09
it would be much more with lot of duplicated definitions (there is space to update existing definitions and replace additionalProperties
with unevaluatedProperties
and make the schema smaller and cleaner).
But to support this the Ajv
validator instance passed to validateInputSchema
function has to support this version, so we need update the Ajv
import in all places that are calling this function from:
import Ajv from 'ajv'
to
import Ajv from 'ajv/dist/2019'
It should be these places (we should change it together with bumping version of @apify/input_schema
once this PR is merged):
apify/apify-worker
- when validating schema during buildapify/apify-cli
-validate-schema
commandapify/apify-core
- just admin input schema playground
Note: the draft 2019-09
is used only to validate input-schema against our meta JSON Schema, but input-schema itself has only features from draft 07
so when validating input against input-schema we don't need to change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe add createAjv
or some other mechanism that would make sure that the correct schema version is used every time, without the need to check every call of the function? Or maybe there is some property on the Ajv instance that we could check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added check (ensureAjvSupportsDraft2019
) that would validate it or throw exception. Do you think it's enough?
"unevaluatedProperties": false, | ||
"oneOf": [ | ||
{ | ||
"required": ["editor"], | ||
"properties": { | ||
"editor": { "enum": ["select"] }, | ||
"items": { "$ref": "#/definitions/arrayItemsSelect" } | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and other oneOf
items are to restrict how sub-schema can look for some editors. Based on the previous discussion in this PR we want to limit the sub-schema to be compatible with editor.
@gippy @fnesveda @valekjo
The plan is to merge this, update worker and cli with new version (and use JSON Schema draft 2019 for input schema validation) and then:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ton of work 💪 I've only checked the schema.json briefly (to understand a bit how it's constructed), but mostly relying on tests.
I remember we had some issues with fields with dots in their name, we should check if that could cause any issues here.
@@ -168,6 +227,10 @@ export function validateExistenceOfRequiredFields(inputSchema: InputSchema) { | |||
* then checks that all required fields are present and finally checks fully against the whole schema. | |||
* | |||
* This way we get the most accurate error message for user. | |||
* | |||
* @param validator An instance of AJV validator. Important: The JSON Schema that the passed input schema is validated against | |||
* is using features from JSON Schema 2019 draft, so the AJV instance must support it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe add createAjv
or some other mechanism that would make sure that the correct schema version is used every time, without the need to check every call of the function? Or maybe there is some property on the Ajv instance that we could check?
properties: { | ||
myField: { | ||
title: 'Field title', | ||
type: 'object', | ||
description: 'Description', | ||
editor: 'schemaBased', | ||
additionalProperties: false, | ||
properties: { | ||
key: { | ||
type: 'object', | ||
title: 'Key', | ||
description: 'Key description', | ||
editor: 'json', | ||
properties: { | ||
key1: { | ||
type: 'string', | ||
title: 'Key 1', | ||
description: 'Key 1 description', | ||
}, | ||
key2: { | ||
type: 'string', | ||
title: 'Key 2', | ||
description: 'Key 2 description', | ||
}, | ||
}, | ||
}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: This defines inputs like
{ myField: {
key: {
key1: '',
key2: '',
}
} }
Would this be enough for the test?
{ myField: {
key1: '',
key2: '',
} }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing that the recursion sub-schema of sub-object works too
I've added test for it and it seems it doesn't cause any issues here, but we would definitely need to double check it in UI because it might cause issues with our current implementation of input UI with Formik |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uff, this is a pretty brutal PR, good job! I hope I didn't miss anything but I think it looks good.
Oh sorry, should be already fixed by apify/apify-cli#853 (backported to latest) |
Input sub-schemas
The PR is ready to be merged, few changes from the original proposal were made based on the discussion here:
keyValue
editor should define object with two string propertieskey
andvalue
and nothing else)schemaBased
editor can be used only in root properties🎯 Goal
The goal of this proposal is to enable creators to define sub-schemas within an Actor's input schema for fields of type
array
andobject
. These field types would support specifying their inner structure, which would be used both for input validation and for rendering the corresponding (sub)fields in the input UI form.📝 Solution
The proposed solution leverages "native" features of JSON Schema, utilizing its properties such as
properties
,required
,additionalProperties
(for object) anditems
(for array).As a result of this, creators would be able to define input schema like this:
Actor with schema like this, would then accept input like:
Recursiveness
The schema should support recursion, allowing creators to define nested objects within other objects. This enables complex and deeply structured input definitions as needed.
The same goes with arrays:
👨💻 Implementation
JSON Schema
At the JSON Schema level, the implementation is relatively straightforward and can be basically follow the approach used in this PR. The proposed changes include:
object
andarray
definitions with new properties:object
type can include:properties
- defines the internal structure of the object. It supports all property types available at the root input schema level (with mentioned restrictions).additionalProperties
- specifies whether properties not listed inproperties
are allowedrequired
- lists which properties defined inproperties
are requiredarray
type can includeitems
- defines the type (and optionally the shape) of array itemsValidation
Actor's Input Schema
Validation would almost work out of the box with the updated
schema.json
file, but a few additional steps are required:validateProperties
function), we also need to validate sub-properties. The validation has to be done against a different set of definitions in case of root-level properties and sub-properties. This should be straightforward to implement.parseAjvError
needs to be updated to correctly display the path to the relevant (sub)property in validation errors. Again, this is not expected to be complex.Input
Because all newly added properties (
properties
,additionalProperties
,required
anditems
) are native features of JSON Schema, input validation against the Actor's input schema will work entirely out of the box.Input UI
In the Input UI, we want to give creators the flexibility to render each sub-field of a complex input field independently.
A proof of concept has already been implemented (currently only for object fields), and there are no major blockers to a full implementation. You can see the draft PR here: https://github.com/apify/apify-core/pull/21564
Creators should have the option to choose how a field is rendered:
json
), in which case the sub-schema is used solely for validation.To support the latter, we need to introduce a new editor type that signals sub-schema-based rendering. I’ve tentatively called this editor
schemaBased
, but the name is open for discussion.For arrays using sub-schemas with the
schemaBased
editor, we’ll need a UI component that includes controls to add, remove, and optionally reorder items.Note: Based on the discussion below, we decide to limit
schemaBased
editor only for root level properties.Technical Implementation Notes
schemaBased
editor is used.❓ Limitations and open questions
Root-level properties
We are effectively reusing the existing "root-level" property definitions within sub-schemas. However, not all root-level properties make sense in context of sub-schema. Specifically, the following properties are questionable:
default
,prefill
andexample
- These are better suited for the root property that contains the entire object or array. Applying them to individual sub-fields could lead to unexpected or inconsistent behavior.sectionCaption
andsectionDescription
- These introduce structural elements (sections) and may not make sense inside nested sub-schemas. We should consider either removing them entirely from sub-schemas or revisiting their design from a UI/UX perspective (e.g. nested sections).schemaBased
editorAs mentioned in the Input UI section, there is a need to introduce a new editor type that signals that each sub-property of a complex object or array should be rendered as a standalone field using its sub-schema definition.
I’ve proposed the name
schemaBased
for this new editor, but the name is open for discussion.Compatibility between editor and sub-schema
A key question is whether we should validate the compatibility between the
editor
and the defined sub-schema within the Actor's input schema, or leave this responsibility to the Actor developer.Example scenario
A developer defines a property of type
array
witheditor: stringList
, but also provides a sub-schema specifying object-type items. The input UI would generate a list of strings, while the validation would expect objects, resulting in invalid input.Possible Approaches:
Allow any combination of
editor
and sub-schema, and assume the creator understands which combinations are valid. This offers maximum flexibility but increases the risk of misconfiguration.that would be
schemaBased
,json
andhidden
.Enforce that sub-schemas match expected structures for specific editors. For example for
stringList
editor the sub-schema can only be string typed items, forrequestListSources
it's object with strictly defined properties. But this would make the JSON Schema way more complicated with lot's of if-else branches and duplicated properties definitions.