Skip to content

feat(input_schema): Enable sub-schemas in input-schema #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 17, 2025

Conversation

MFori
Copy link
Member

@MFori MFori commented Jun 11, 2025

Input sub-schemas

Note: This is a proposal, not a final implementation intended for immediate merging in this state. The purpose of this PR is to present a suggested solution for discussion and approval, not the completed implementation.

The PR is ready to be merged, few changes from the original proposal were made based on the discussion here:

  • validation if the sub-schema is compatible with the editor (e.g. keyValue editor should define object with two string properties key and value and nothing else)
  • schemaBased editor can be used only in root properties

🎯 Goal

The goal of this proposal is to enable creators to define sub-schemas within an Actor's input schema for fields of type array and object. These field types would support specifying their inner structure, which would be used both for input validation and for rendering the corresponding (sub)fields in the input UI form.

📝 Solution

The proposed solution leverages "native" features of JSON Schema, utilizing its properties such as properties, required, additionalProperties (for object) and items (for array).

As a result of this, creators would be able to define input schema like this:

{
  "title": "Apify Actor input schema example",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "my-object": {
      "type": "object",
      "title": "My object",
      "editor": "schemaBased",
      "properties": {
        "key1": {
          "type": "string",
          "title": "Key 1",
          "description": "Description",
          "editor": "textfield"
        },
        "key2": {
          "type": "integer",
          "title": "Key 2",
          "description": "Description",
          "editor": "integer"
        }
      },
      "required": ["key1"],
      "additionalProperties": false
    },
    "my-array": {
      "type": "array",
      "title": "My array",
      "editor": "json",
      "items": {
        "type": "object",
        "properties": {
          "key1": {
            "type": "string",
            "title": "Key 2",
            "description": "Description",
            "editor": "textfield"
          },
          "key2": {
            "type": "integer",
            "title": "Key 2",
            "description": "Description",
            "editor": "integer"
          }
        },
        "required": ["key1"],
        "additionalProperties": false
      }
    }
  },
  "required": []
}

Actor with schema like this, would then accept input like:

{
  "my-object": {
    "key1": "test",
    "key2": 123
  },
  "my-array": [
    {
      "key1": "test", 
      "key2": 123
    },
    {
      "key1": "test" 
    }
  ]
}

Recursiveness

The schema should support recursion, allowing creators to define nested objects within other objects. This enables complex and deeply structured input definitions as needed.

{
  "title": "Apify Actor input schema example",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "my-object": {
      "type": "object",
      "title": "My object",
      "editor": "schemaBased",
      "properties": {
        "key1": {
          "type": "string",
          "title": "Key 1",
          "description": "Description",
          "editor": "textfield"
        },
        "key2": {
          "type": "object",
          "title": "Key 2",
          "description": "Description",
          "editor": "schemaBased",
          "properties": {
            "subKey": {
              "type": "string",
              "title": "SubKey",
              "description": "Description",
              "editor": "textfield"
            }
          }
        }
      },
      "required": ["key1"],
      "additionalProperties": false
    },
  "required": []
}

The same goes with arrays:

{
  "title": "Apify Actor input schema example",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "my-array": {
      "type": "array",
      "title": "My array",
      "editor": "schemaBased",
      "items": {
        "type": "array",
        "items": {
          "type": "string"
        }
      }
    }
  }
  "required": []
}

👨‍💻 Implementation

JSON Schema

At the JSON Schema level, the implementation is relatively straightforward and can be basically follow the approach used in this PR. The proposed changes include:

  • Creating new definitions for each property type - some properties used in the root schema don’t make sense within a sub-schema context. Therefore, instead of reusing the root definitions with complex conditions, it’s simpler to create tailored definitions for sub-schemas.
  • Extending the object and array definitions with new properties:
    • object type can include:

      • properties - defines the internal structure of the object. It supports all property types available at the root input schema level (with mentioned restrictions).

      • additionalProperties - specifies whether properties not listed in properties are allowed

      • required - lists which properties defined in properties are required

        {
          "type": "object",
          "properties": {
            "key": {
              ...
            }
          },
          "additionalProperties": false,
          "required": "key"
        }
        
    • array type can include

      • items - defines the type (and optionally the shape) of array items

        {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {...}
          }
        }
        

Validation

Actor's Input Schema

Validation would almost work out of the box with the updated schema.json file, but a few additional steps are required:

  • Since we're manually validating all properties one by one against the schema (using the validateProperties function), we also need to validate sub-properties. The validation has to be done against a different set of definitions in case of root-level properties and sub-properties. This should be straightforward to implement.
  • the logic in parseAjvError needs to be updated to correctly display the path to the relevant (sub)property in validation errors. Again, this is not expected to be complex.

Input

Because all newly added properties (properties, additionalProperties, required and items) are native features of JSON Schema, input validation against the Actor's input schema will work entirely out of the box.

Input UI

In the Input UI, we want to give creators the flexibility to render each sub-field of a complex input field independently.

A proof of concept has already been implemented (currently only for object fields), and there are no major blockers to a full implementation. You can see the draft PR here: https://github.com/apify/apify-core/pull/21564

Note: The code in the PR is intentionally minimal, not optimal and not production-ready. Its purpose is to validate the approach.

Creators should have the option to choose how a field is rendered:

  • Use an existing editor (e.g. json), in which case the sub-schema is used solely for validation.
  • Or render each sub-property as an individual input field based on the sub-schema.

To support the latter, we need to introduce a new editor type that signals sub-schema-based rendering. I’ve tentatively called this editor schemaBased, but the name is open for discussion.

For arrays using sub-schemas with the schemaBased editor, we’ll need a UI component that includes controls to add, remove, and optionally reorder items.

Note: Based on the discussion below, we decide to limit schemaBased editor only for root level properties.

Technical Implementation Notes

  • The main change in the Input UI will be to recursively render sub-fields when the schemaBased editor is used.
  • We’ll use dot notation (e.g. field.subField) for Formik field names to ensure proper binding. Formik handles this automatically.
  • We'll also need to support labels, descriptions, and other for sub-fields, but this should be relatively straightforward.

❓ Limitations and open questions

Root-level properties

We are effectively reusing the existing "root-level" property definitions within sub-schemas. However, not all root-level properties make sense in context of sub-schema. Specifically, the following properties are questionable:

  • default, prefill and example - These are better suited for the root property that contains the entire object or array. Applying them to individual sub-fields could lead to unexpected or inconsistent behavior.
  • sectionCaption and sectionDescription - These introduce structural elements (sections) and may not make sense inside nested sub-schemas. We should consider either removing them entirely from sub-schemas or revisiting their design from a UI/UX perspective (e.g. nested sections).

schemaBased editor

As mentioned in the Input UI section, there is a need to introduce a new editor type that signals that each sub-property of a complex object or array should be rendered as a standalone field using its sub-schema definition.

I’ve proposed the name schemaBased for this new editor, but the name is open for discussion.

Compatibility between editor and sub-schema

A key question is whether we should validate the compatibility between the editor and the defined sub-schema within the Actor's input schema, or leave this responsibility to the Actor developer.

Example scenario

A developer defines a property of type array with editor: stringList, but also provides a sub-schema specifying object-type items. The input UI would generate a list of strings, while the validation would expect objects, resulting in invalid input.

Possible Approaches:

  1. No restrictions (responsibility on creator)
    Allow any combination of editor and sub-schema, and assume the creator understands which combinations are valid. This offers maximum flexibility but increases the risk of misconfiguration.
  2. Restrict available editors when a sub-schema is defined
    that would be schemaBased, json and hidden.
  3. Strict validation based on editor type
    Enforce that sub-schemas match expected structures for specific editors. For example for stringList editor the sub-schema can only be string typed items, for requestListSources it's object with strictly defined properties. But this would make the JSON Schema way more complicated with lot's of if-else branches and duplicated properties definitions.

Note to this: We are currently validating structure of input for some editors (for example requestListSources) manually in validateInputUsingValidator. So in this case editor is not used just for UI but also influence the validation.

@MFori MFori self-assigned this Jun 11, 2025
@MFori MFori added the t-console Issues with this label are in the ownership of the console team. label Jun 11, 2025
@github-actions github-actions bot added this to the 116th sprint - Console team milestone Jun 11, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jun 11, 2025
@MFori MFori marked this pull request as ready for review June 12, 2025 16:04
@MFori MFori changed the title feat(input_schema): Input sub-schema Proposal: Input sub-schema Jun 12, 2025
@MFori MFori requested review from mhamas, fnesveda, gippy and jancurn June 12, 2025 16:05
@jancurn
Copy link
Member

jancurn commented Jun 15, 2025

Hey @MFori, good job, this proposal makes sense. I like we're only expanding input schema to support more of the JSON schema features, without creating more custom extensions. This is great for calling Actors via MCP or OpenAPI.

Few notes/questions:

  • schemaBased editor - do we really need to add this now, do you guys have specific use cases in mind, or it's just "for the future" ? I have a gut feeling this might be quite hard to make work well with all the possible kinds of sub-schemas.
  • Compatibility with editor - IMO we should validate if the sub-schema is compatible with the editor, it will make the experience better for both creators and users. Think of the "Jesus Christ principle" :)
  • In recursive sub-sub-objects and sub-sub-arrays, will we still only support the limited JSON schema features, or all JSON schema features? IMO it makes sense to only support the limited set.

@MFori
Copy link
Member Author

MFori commented Jun 16, 2025

  • schemaBased editor - do we really need to add this now, do you guys have specific use cases in mind, or it's just "for the future" ? I have a gut feeling this might be quite hard to make work well with all the possible kinds of sub-schemas.

Yes, we have a use-case from store team as they want to have a list of filters in Actor input, which would look like this. It would suit well for array field with object typed items with sub-schema with three fields (string enum, string enum, string)

image

I think it wouldn't be that bad because under the sub-schema with schemaBased editor you would have the same schema and editors that we have on the root level.

  • Compatibility with editor - IMO we should validate if the sub-schema is compatible with the editor, it will make the experience better for both creators and users. Think of the "Jesus Christ principle" :)

Make sense. Question is whether to solve this on the schema level, which would make the schema definition quite complex, or "manually" as an additional validation.

  • In recursive sub-sub-objects and sub-sub-arrays, will we still only support the limited JSON schema features, or all JSON schema features? IMO it makes sense to only support the limited set.

Yeah, just the limited set of properties. The enhanced one is supported only on the root-level.

Copy link
Member

@gippy gippy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, I'm not 100% sure about the schemaBased editor, maybe subfields as editor name would work better?

Also I think we should lock it to only allow one level of recursivity, otherwise the UI might be pretty painful if someone missuses it.

@jancurn
Copy link
Member

jancurn commented Jun 16, 2025

Thanks for the clarification. I agree with @gippy - let's keep it simple and limited at start, we can always extend it in the future (but vice versa not easily). We should pick an appropriate editor name for that, subfields is not too bad, but it doesn't work well for arrays. Actually, do we need an editor at all? Like for most other fields, we try to provide the best editor for the situation. So if the schema is compatible, we can do just that, and display the editor for it, so maybe we don't need editor field at all. If the dev doesn't want us to render the UI editor, they can use editor: json.

Compatibility with editor - IMO we should validate if the sub-schema is compatible with the editor, it will make the experience better for both creators and users. Think of the "Jesus Christ principle" :)

Make sense. Question is whether to solve this on the schema level, which would make the schema definition quite complex, or "manually" as an additional validation.

I think we can have this as a hard-coded validation in Apify CLI and on platform, if there's no easy way to squeeze it to our meta-schema.

In recursive sub-sub-objects and sub-sub-arrays, will we still only support the limited JSON schema features, or all JSON schema features? IMO it makes sense to only support the limited set.

Yeah, just the limited set of properties. The enhanced one is supported only on the root-level.

Cool, please let's just mention that in the docs.

@jancurn
Copy link
Member

jancurn commented Jun 16, 2025

And one more note: now that we have this, I think we should deprecate patternKey and patternValue and other similar relicts from the past, and get closer to JSON schema which is an industry standard, widely understood by LLMs.

@MFori
Copy link
Member Author

MFori commented Jun 17, 2025

Actually, do we need an editor at all? Like for most other fields, we try to provide the best editor for the situation. So if the schema is compatible, we can do just that, and display the editor for it, so maybe we don't need editor field at all. If the dev doesn't want us to render the UI editor, they can use editor: json.

I'm not sure about the implicit editor. Currently for some of the field types we always require the editor to be defined (string, array, object) and for the rest (string-enum, integer, boolean, resource) we render the default editor if it's omitted in the input schema, but still it can be defined explicitly.

So I would vote for choosing some name for this new editor, but yes we can make it default if the sub-schema is defined. If not, let's make the json editor default and no more required.

I can think of:

  • schemaBased
  • subfields
  • composite
  • compound
  • subschema

@jancurn
Copy link
Member

jancurn commented Jun 17, 2025

Fair enough, "subschema" sounds best to me, and if not used, we render JSON editor

@mtrunkat
Copy link
Member

mtrunkat commented Jun 17, 2025

@gippy, @MFori, please add me to these proposals next time.

I see you reached some conclusion, but I want to raise one thing:

People who are using our UI are not our core persona. People who are integrating us are so a good question is if this type of input will work for those people?

  • Integration via API? I believe they will make it if we invest a lot of time in documenting this well on the Actor's page - API tab/modal.
  • Integration via Zapier? Are we able to render this in Zapier? If not we will just display a textarea there. But the users will likely miss there a guidance. Have we think about this?
  • Integration via MCP? I am not sure if Claude will be able to fill in such input, and it would be worth giving it a shot.

Building a UI internally for this is easy, but that is not how people use Apify, and we need to make sure that all the features here will work in all 3 (and more) cases above.

CleanShot 2025-06-17 at 14 19 31@2x

@MFori
Copy link
Member Author

MFori commented Jun 18, 2025

People who are using our UI are not our core persona. People who are integrating us are so a good question is if this type of input will work for those people?

My thought was that if currently some Actor accepts object or array it (usually) still requires some shape of this data and if it's not fulfilled the execution fails in the run. Enabling validation of the input moves the point of failure to the start of the flow, run is not started, and we can even provide better error messages so debugging is easier. The subschema in UI in the console is just a bonus for me.

  • Integration via API? I believe they will make it if we invest a lot of time in documenting this well on the Actor's page - API tab/modal.

I think documenting the shape of input field in Actor's page should be quite straightforward.

  • Integration via Zapier? Are we able to render this in Zapier? If not we will just display a textarea there. But the users will likely miss there a guidance. Have we think about this?

Again the integration would still fail if Actor expect some shape, but with sub-schema it would fail even before execution not inside the Actor. It would be nice if we could render proper UI here too, but providing the json input should be IMHO still fine. But adding @drobnikj - do you think it would be possible to render sub-schema fields in integration platforms?

  • Integration via MCP? I am not sure if Claude will be able to fill in such input, and it would be worth giving it a shot.

I was thinking, in theory, this should add more context for LLMs, so they should be able to fill in even better. But adding @MQ37 - do you think my assumption is correct and can we prove it somehow?

@MQ37
Copy link

MQ37 commented Jun 18, 2025

I was thinking, in theory, this should add more context for LLMs, so they should be able to fill in even better. But adding @MQ37 - do you think my assumption is correct and can we prove it somehow?

@MFori that would be actually great if Actor input schema would adhere JSON schema spec. Currently in the MCP server we infer the "sub-schemas" (array items and object properties) based on either the editor type, default and prefill values so this would make it easier and more robust for us if the Actor creator would include that in the Actor schema. So to answer your question LLM (Claude) are currently using such input (schemas) and it works - we implemented this in the MCP server because some MCP clients required this and refused to work without the "sub-schemas".

@jancurn
Copy link
Member

jancurn commented Jun 18, 2025

I agree with the above - we already accept object and array without any schema, so adding a schema will only improve the situation, not make anything worse. The Actor creators still need to understand that more complex objects will make integrations harder.

Just last small change - please let's call the editor schemaBased as originally proposed, it will be more consistent with other editor names, and also subschema looks like you're editing a schema, not based on it.

MFori added 2 commits July 7, 2025 10:29
# Conflicts:
#	packages/input_schema/src/schema.json
#	test/utilities.client.test.ts
@MFori MFori marked this pull request as draft July 8, 2025 07:29
@MFori MFori changed the title Proposal: Input sub-schema feat(input_schema): Enable sub-schemas in input-schema Jul 9, 2025
// remove leading and trailing slashes and replace remaining slashes with dots
const cleanPropertyName = (name: string) => {
return name.replace(/^\/|\/$/g, '').replace(/\//g, '.');
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed, because now error can be related to sub-properties and we want to show nice path.

const matchingDefinitions = Object
.values<any>(definitions) // cast as any, as the code in first branch seems to be invalid
.filter((definition) => {
// Because the definitions contains not only the root properties definitions, but also sub-schema definitions
// and utility definitions, we need to filter them out and validate only against the appropriate ones.
// We do this by checking prefix of the definition title (Utils: or Sub-schema:)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found better approach then grouping the definitions by its title, it's not that robust but works well.
Another approach would be to define definitions, sub_definitions and util_definitions but that's not valid JSON Schema keywords and wouldn't pass Ajv strict validation.
Or put it as sub objects like definitions.root, definitions.sub, definitions.utils, but haven't test it.

But since this is used just here I think it's ok to go with this solution.

@@ -168,6 +227,10 @@ export function validateExistenceOfRequiredFields(inputSchema: InputSchema) {
* then checks that all required fields are present and finally checks fully against the whole schema.
*
* This way we get the most accurate error message for user.
*
* @param validator An instance of AJV validator. Important: The JSON Schema that the passed input schema is validated against
* is using features from JSON Schema 2019 draft, so the AJV instance must support it.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already more than doubled it's length, but without utilising the unevaluatedProperties (https://json-schema.org/understanding-json-schema/reference/object#unevaluatedproperties) from draft 2019-09 it would be much more with lot of duplicated definitions (there is space to update existing definitions and replace additionalProperties with unevaluatedProperties and make the schema smaller and cleaner).

But to support this the Ajv validator instance passed to validateInputSchema function has to support this version, so we need update the Ajv import in all places that are calling this function from:

import Ajv from 'ajv'

to

import Ajv from 'ajv/dist/2019'

It should be these places (we should change it together with bumping version of @apify/input_schema once this PR is merged):

  • apify/apify-worker - when validating schema during build
  • apify/apify-cli - validate-schema command
  • apify/apify-core - just admin input schema playground

Note: the draft 2019-09 is used only to validate input-schema against our meta JSON Schema, but input-schema itself has only features from draft 07 so when validating input against input-schema we don't need to change anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add createAjv or some other mechanism that would make sure that the correct schema version is used every time, without the need to check every call of the function? Or maybe there is some property on the Ajv instance that we could check?

Copy link
Member Author

@MFori MFori Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added check (ensureAjvSupportsDraft2019) that would validate it or throw exception. Do you think it's enough?

Comment on lines +199 to +207
"unevaluatedProperties": false,
"oneOf": [
{
"required": ["editor"],
"properties": {
"editor": { "enum": ["select"] },
"items": { "$ref": "#/definitions/arrayItemsSelect" }
}
},
Copy link
Member Author

@MFori MFori Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other oneOf items are to restrict how sub-schema can look for some editors. Based on the previous discussion in this PR we want to limit the sub-schema to be compatible with editor.

@MFori MFori requested review from valekjo and removed request for mhamas July 10, 2025 09:02
@MFori MFori marked this pull request as ready for review July 10, 2025 09:02
@MFori
Copy link
Member Author

MFori commented Jul 10, 2025

@gippy @fnesveda @valekjo
The PR is ready for review, few changes from the original proposal were made based on the discussion here:

  • validation if the sub-schema is compatible with the editor (e.g. keyValue editor should define object with two string properties key and value and nothing else)
  • schemaBased editor can be used only in root properties

The plan is to merge this, update worker and cli with new version (and use JSON Schema draft 2019 for input schema validation) and then:

  • update UI to support schemaBased editor
  • update UI to present the sub-schema in input tab on Actor detail
  • update docs

Copy link
Member

@valekjo valekjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ton of work 💪 I've only checked the schema.json briefly (to understand a bit how it's constructed), but mostly relying on tests.

I remember we had some issues with fields with dots in their name, we should check if that could cause any issues here.

@@ -168,6 +227,10 @@ export function validateExistenceOfRequiredFields(inputSchema: InputSchema) {
* then checks that all required fields are present and finally checks fully against the whole schema.
*
* This way we get the most accurate error message for user.
*
* @param validator An instance of AJV validator. Important: The JSON Schema that the passed input schema is validated against
* is using features from JSON Schema 2019 draft, so the AJV instance must support it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add createAjv or some other mechanism that would make sure that the correct schema version is used every time, without the need to check every call of the function? Or maybe there is some property on the Ajv instance that we could check?

Comment on lines +550 to +578
properties: {
myField: {
title: 'Field title',
type: 'object',
description: 'Description',
editor: 'schemaBased',
additionalProperties: false,
properties: {
key: {
type: 'object',
title: 'Key',
description: 'Key description',
editor: 'json',
properties: {
key1: {
type: 'string',
title: 'Key 1',
description: 'Key 1 description',
},
key2: {
type: 'string',
title: 'Key 2',
description: 'Key 2 description',
},
},
},
},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This defines inputs like

{ myField: {
  key: {
    key1: '',
    key2: '',
  }
} }

Would this be enough for the test?

{ myField: {
  key1: '',
  key2: '',
} }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing that the recursion sub-schema of sub-object works too

@MFori
Copy link
Member Author

MFori commented Jul 15, 2025

I remember we had some issues with fields with dots in their name, we should check if that could cause any issues here.

I've added test for it and it seems it doesn't cause any issues here, but we would definitely need to double check it in UI because it might cause issues with our current implementation of input UI with Formik

Copy link
Member

@fnesveda fnesveda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff, this is a pretty brutal PR, good job! I hope I didn't miss anything but I think it looks good.

@MFori MFori merged commit d1a8e92 into master Jul 17, 2025
9 checks passed
@MFori MFori deleted the feat/input-sub-schema branch July 17, 2025 08:02
@B4nan
Copy link
Member

B4nan commented Jul 17, 2025

apparently this broke the latest version of the CLI

image

@MFori
Copy link
Member Author

MFori commented Jul 17, 2025

apparently this broke the latest version of the CLI

Oh sorry, should be already fixed by apify/apify-cli#853 (backported to latest)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-console Issues with this label are in the ownership of the console team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants