Skip to content

Commit d1a8e92

Browse files
authored
feat(input_schema): Enable sub-schemas in input-schema (#519)
# Input sub-schemas > ~~**Note**: This is a proposal, not a final implementation intended for immediate merging in this state. The purpose of this PR is to present a suggested solution for discussion and approval, not the completed implementation.~~ The PR is ready to be merged, few changes from the original proposal were made based on the discussion here: - validation if the sub-schema is compatible with the editor (e.g. `keyValue` editor should define object with two string properties `key` and `value` and nothing else) - `schemaBased` editor can be used only in root properties ## 🎯 Goal The goal of this proposal is to enable creators to define sub-schemas within an Actor's input schema for fields of type `array` and `object`. These field types would support specifying their inner structure, which would be used both for input validation and for rendering the corresponding (sub)fields in the input UI form. ## 📝 Solution The proposed solution leverages "native" features of JSON Schema, utilizing its properties such as `properties`, `required`, `additionalProperties` (for object) and `items` (for array). As a result of this, creators would be able to define input schema like this: ``` { "title": "Apify Actor input schema example", "type": "object", "schemaVersion": 1, "properties": { "my-object": { "type": "object", "title": "My object", "editor": "schemaBased", "properties": { "key1": { "type": "string", "title": "Key 1", "description": "Description", "editor": "textfield" }, "key2": { "type": "integer", "title": "Key 2", "description": "Description", "editor": "integer" } }, "required": ["key1"], "additionalProperties": false }, "my-array": { "type": "array", "title": "My array", "editor": "json", "items": { "type": "object", "properties": { "key1": { "type": "string", "title": "Key 2", "description": "Description", "editor": "textfield" }, "key2": { "type": "integer", "title": "Key 2", "description": "Description", "editor": "integer" } }, "required": ["key1"], "additionalProperties": false } } }, "required": [] } ``` Actor with schema like this, would then accept input like: ``` { "my-object": { "key1": "test", "key2": 123 }, "my-array": [ { "key1": "test", "key2": 123 }, { "key1": "test" } ] } ``` ### Recursiveness The schema should support recursion, allowing creators to define nested objects within other objects. This enables complex and deeply structured input definitions as needed. ``` { "title": "Apify Actor input schema example", "type": "object", "schemaVersion": 1, "properties": { "my-object": { "type": "object", "title": "My object", "editor": "schemaBased", "properties": { "key1": { "type": "string", "title": "Key 1", "description": "Description", "editor": "textfield" }, "key2": { "type": "object", "title": "Key 2", "description": "Description", "editor": "schemaBased", "properties": { "subKey": { "type": "string", "title": "SubKey", "description": "Description", "editor": "textfield" } } } }, "required": ["key1"], "additionalProperties": false }, "required": [] } ``` The same goes with arrays: ``` { "title": "Apify Actor input schema example", "type": "object", "schemaVersion": 1, "properties": { "my-array": { "type": "array", "title": "My array", "editor": "schemaBased", "items": { "type": "array", "items": { "type": "string" } } } } "required": [] } ``` ## 👨‍💻 Implementation ### JSON Schema At the JSON Schema level, the implementation is relatively straightforward and can be basically follow the approach used in this PR. The proposed changes include: - **Creating new definitions for each property type** - some properties used in the root schema don’t make sense within a sub-schema context. Therefore, instead of reusing the root definitions with complex conditions, it’s simpler to create tailored definitions for sub-schemas. - **Extending the `object` and `array` definitions with new properties**: - `object` type can include: - `properties` - defines the internal structure of the object. It supports all property types available at the root input schema level (with mentioned restrictions). - `additionalProperties` - specifies whether properties not listed in `properties` are allowed - `required` - lists which properties defined in `properties` are required ``` { "type": "object", "properties": { "key": { ... } }, "additionalProperties": false, "required": "key" } ``` - `array` type can include - `items` - defines the type (and optionally the shape) of array items ``` { "type": "array", "items": { "type": "object", "properties": {...} } } ``` ### Validation #### Actor's Input Schema Validation would almost work out of the box with the updated `schema.json` file, but a few additional steps are required: - Since we're manually validating all properties one by one against the schema (using the `validateProperties` function), we also need to validate sub-properties. The validation has to be done against a different set of definitions in case of root-level properties and sub-properties. This should be straightforward to implement. - the logic in `parseAjvError` needs to be updated to correctly display the path to the relevant (sub)property in validation errors. Again, this is not expected to be complex. #### Input Because all newly added properties (`properties`, `additionalProperties`, `required` and `items`) are native features of JSON Schema, input validation against the Actor's input schema will work entirely out of the box. ### Input UI In the Input UI, we want to give creators the flexibility to render each sub-field of a complex input field independently. A proof of concept has already been implemented (currently only for object fields), and there are no major blockers to a full implementation. You can see the draft PR here: apify/apify-core#21564 > Note: The code in the PR is intentionally minimal, not optimal and not production-ready. Its purpose is to validate the approach. Creators should have the option to choose how a field is rendered: - Use an existing editor (e.g. `json`), in which case the sub-schema is used solely for validation. - Or render each sub-property as an individual input field based on the sub-schema. To support the latter, we need to introduce a new editor type that signals sub-schema-based rendering. I’ve tentatively called this editor `schemaBased`, but the name is open for discussion. For arrays using sub-schemas with the `schemaBased` editor, we’ll need a UI component that includes controls to add, remove, and optionally reorder items. **Note**: Based on the discussion below, we decide to limit `schemaBased` editor only for root level properties. **Technical Implementation Notes** - The main change in the Input UI will be to recursively render sub-fields when the `schemaBased` editor is used. - We’ll use dot notation (e.g. field.subField) for Formik field names to ensure proper binding. Formik handles this automatically. - We'll also need to support labels, descriptions, and other for sub-fields, but this should be relatively straightforward. ## ❓ Limitations and open questions ### Root-level properties We are effectively reusing the existing "root-level" property definitions within sub-schemas. However, not all root-level properties make sense in context of sub-schema. Specifically, the following properties are questionable: - `default`, `prefill` and `example` - These are better suited for the root property that contains the entire object or array. Applying them to individual sub-fields could lead to unexpected or inconsistent behavior. - `sectionCaption` and `sectionDescription` - These introduce structural elements (sections) and may not make sense inside nested sub-schemas. We should consider either removing them entirely from sub-schemas or revisiting their design from a UI/UX perspective (e.g. nested sections). ### `schemaBased` editor As mentioned in the Input UI section, there is a need to introduce a new editor type that signals that each sub-property of a complex object or array should be rendered as a standalone field using its sub-schema definition. I’ve proposed the name `schemaBased` for this new editor, but the name is open for discussion. ### Compatibility between editor and sub-schema A key question is whether we should validate the compatibility between the `editor` and the defined sub-schema within the Actor's input schema, or leave this responsibility to the Actor developer. #### Example scenario A developer defines a property of type `array` with `editor: stringList`, but also provides a sub-schema specifying object-type items. The input UI would generate a list of strings, while the validation would expect objects, resulting in invalid input. Possible Approaches: 1. **No restrictions (responsibility on creator)** Allow any combination of `editor` and sub-schema, and assume the creator understands which combinations are valid. This offers maximum flexibility but increases the risk of misconfiguration. 2. **Restrict available editors when a sub-schema is defined** that would be `schemaBased`, `json` and `hidden`. 3. **Strict validation based on editor type** Enforce that sub-schemas match expected structures for specific editors. For example for `stringList` editor the sub-schema can only be string typed items, for `requestListSources` it's object with strictly defined properties. But this would make the JSON Schema way more complicated with lot's of if-else branches and duplicated properties definitions. > Note to this: We are currently validating structure of input for some editors (for example `requestListSources`) manually in `validateInputUsingValidator`. So in this case `editor` is not used just for UI but also influence the validation.
1 parent eb3b0da commit d1a8e92

File tree

6 files changed

+1426
-100
lines changed

6 files changed

+1426
-100
lines changed

packages/input_schema/src/input_schema.ts

Lines changed: 112 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,33 @@ import type {
1010
InputSchemaBaseChecked,
1111
StringFieldDefinition,
1212
} from './types';
13+
import { ensureAjvSupportsDraft2019 } from './utilities';
1314

1415
export { schema as inputSchema };
1516

1617
const { definitions } = schema;
1718

19+
// Because the definitions contain not only the root properties definitions, but also sub-schema definitions
20+
// and utility definitions, we need to filter them out and validate only against the appropriate ones.
21+
// We do this by checking the prefix of the definition title (Utils: or Sub-schema:)
22+
23+
const [fieldDefinitions, subFieldDefinitions] = Object
24+
.values<any>(definitions)
25+
.reduce<[any[], any[]]>((acc, definition) => {
26+
if (definition.title.startsWith('Utils:')) {
27+
// skip utility definitions
28+
return acc;
29+
}
30+
31+
if (definition.title.startsWith('Sub-schema:')) {
32+
acc[1].push(definition);
33+
} else {
34+
acc[0].push(definition);
35+
}
36+
37+
return acc;
38+
}, [[], []]);
39+
1840
/**
1941
* This function parses AJV error and transforms it into a readable string.
2042
*
@@ -38,6 +60,11 @@ export function parseAjvError(
3860
let fieldKey: string;
3961
let message: string;
4062

63+
// remove leading and trailing slashes and replace remaining slashes with dots
64+
const cleanPropertyName = (name: string) => {
65+
return name.replace(/^\/|\/$/g, '').replace(/\//g, '.');
66+
};
67+
4168
// If error is with keyword type, it means that type of input is incorrect
4269
// this can mean that provided value is null
4370
if (error.keyword === 'type') {
@@ -48,20 +75,23 @@ export function parseAjvError(
4875
}
4976
message = m('inputSchema.validation.generic', { rootName, fieldKey, message: error.message });
5077
} else if (error.keyword === 'required') {
51-
fieldKey = error.params.missingProperty;
78+
fieldKey = cleanPropertyName(`${error.instancePath}/${error.params.missingProperty}`);
5279
message = m('inputSchema.validation.required', { rootName, fieldKey });
5380
} else if (error.keyword === 'additionalProperties') {
54-
fieldKey = error.params.additionalProperty;
81+
fieldKey = cleanPropertyName(`${error.instancePath}/${error.params.additionalProperty}`);
82+
message = m('inputSchema.validation.additionalProperty', { rootName, fieldKey });
83+
} else if (error.keyword === 'unevaluatedProperties') {
84+
fieldKey = cleanPropertyName(`${error.instancePath}/${error.params.unevaluatedProperty}`);
5585
message = m('inputSchema.validation.additionalProperty', { rootName, fieldKey });
5686
} else if (error.keyword === 'enum') {
57-
fieldKey = error.instancePath.split('/').pop()!;
87+
fieldKey = cleanPropertyName(error.instancePath);
5888
const errorMessage = `${error.message}: "${error.params.allowedValues.join('", "')}"`;
5989
message = m('inputSchema.validation.generic', { rootName, fieldKey, message: errorMessage });
6090
} else if (error.keyword === 'const') {
61-
fieldKey = error.instancePath.split('/').pop()!;
91+
fieldKey = cleanPropertyName(error.instancePath);
6292
message = m('inputSchema.validation.generic', { rootName, fieldKey, message: error.message });
6393
} else {
64-
fieldKey = error.instancePath.split('/').pop()!;
94+
fieldKey = cleanPropertyName(error.instancePath);
6595
message = m('inputSchema.validation.generic', { rootName, fieldKey, message: error.message });
6696
}
6797

@@ -92,10 +122,16 @@ function validateBasicStructure(validator: Ajv, obj: Record<string, unknown>): a
92122

93123
/**
94124
* Validates particular field against it's schema.
125+
* @param validator An instance of AJV validator (must support draft 2019-09).
126+
* @param fieldSchema Schema of the field to validate.
127+
* @param fieldKey Key of the field in the input schema.
128+
* @param isSubField If true, the field is a sub-field of another field, so we need to skip some definitions.
95129
*/
96-
function validateField(validator: Ajv, fieldSchema: Record<string, unknown>, fieldKey: string): asserts fieldSchema is FieldDefinition {
130+
function validateField(validator: Ajv, fieldSchema: Record<string, unknown>, fieldKey: string, isSubField = false): asserts fieldSchema is FieldDefinition {
131+
const relevantDefinitions = isSubField ? subFieldDefinitions : fieldDefinitions;
132+
97133
const matchingDefinitions = Object
98-
.values<any>(definitions) // cast as any, as the code in first branch seems to be invalid
134+
.values<any>(relevantDefinitions) // cast as any, as the code in first branch seems to be invalid
99135
.filter((definition) => {
100136
return definition.properties.type.enum
101137
// This is a normal case where fieldSchema.type can be only one possible value matching definition.properties.type.enum.0
@@ -110,9 +146,19 @@ function validateField(validator: Ajv, fieldSchema: Record<string, unknown>, fie
110146
throw new Error(`Input schema is not valid (${errorMessage})`);
111147
}
112148

149+
// We are validating a field schema against one of the definitions, but one definition can reference other definitions.
150+
// So this basically creates a new JSON Schema with a picked definition at root and puts all definitions from the `schema.json`
151+
// into the `definitions` property of this final schema.
152+
const enhanceDefinition = (definition: object) => {
153+
return {
154+
...definition,
155+
definitions,
156+
};
157+
};
158+
113159
// If there is only one matching then we are done and simply compare it.
114160
if (matchingDefinitions.length === 1) {
115-
validateAgainstSchemaOrThrow(validator, fieldSchema, matchingDefinitions[0], `schema.properties.${fieldKey}`);
161+
validateAgainstSchemaOrThrow(validator, fieldSchema, enhanceDefinition(matchingDefinitions[0]), `schema.properties.${fieldKey}`);
116162
return;
117163
}
118164

@@ -121,30 +167,76 @@ function validateField(validator: Ajv, fieldSchema: Record<string, unknown>, fie
121167
if ((fieldSchema as StringFieldDefinition).enum) {
122168
const definition = matchingDefinitions.filter((item) => !!item.properties.enum).pop();
123169
if (!definition) throw new Error('Input schema validation failed to find "enum property" definition');
124-
validateAgainstSchemaOrThrow(validator, fieldSchema, definition, `schema.properties.${fieldKey}.enum`);
170+
validateAgainstSchemaOrThrow(validator, fieldSchema, enhanceDefinition(definition), `schema.properties.${fieldKey}.enum`);
125171
return;
126172
}
127173
// If the definition contains "resourceType" property then it's resource type.
128174
if ((fieldSchema as CommonResourceFieldDefinition<unknown>).resourceType) {
129175
const definition = matchingDefinitions.filter((item) => !!item.properties.resourceType).pop();
130176
if (!definition) throw new Error('Input schema validation failed to find "resource property" definition');
131-
validateAgainstSchemaOrThrow(validator, fieldSchema, definition, `schema.properties.${fieldKey}`);
177+
validateAgainstSchemaOrThrow(validator, fieldSchema, enhanceDefinition(definition), `schema.properties.${fieldKey}`);
132178
return;
133179
}
134180
// Otherwise we use the other definition.
135181
const definition = matchingDefinitions.filter((item) => !item.properties.enum && !item.properties.resourceType).pop();
136182
if (!definition) throw new Error('Input schema validation failed to find other than "enum property" definition');
137183

138-
validateAgainstSchemaOrThrow(validator, fieldSchema, definition, `schema.properties.${fieldKey}`);
184+
validateAgainstSchemaOrThrow(validator, fieldSchema, enhanceDefinition(definition), `schema.properties.${fieldKey}`);
185+
}
186+
187+
/**
188+
* Validates all subfields (and their subfields) of a given field schema.
189+
*/
190+
function validateSubFields(validator: Ajv, fieldSchema: InputSchemaBaseChecked, fieldKey: string) {
191+
Object.entries(fieldSchema.properties).forEach(([subFieldKey, subFieldSchema]) => {
192+
// The sub-properties has to be validated first, so we got more relevant error messages.
193+
if (subFieldSchema.type === 'object' && subFieldSchema.properties) {
194+
// If the field has sub-fields, we need to validate them as well.
195+
validateSubFields(validator, subFieldSchema as any as InputSchemaBaseChecked, `${fieldKey}.${subFieldKey}`);
196+
}
197+
198+
// If the field is an array and has defined schema (items property), we need to validate it differently.
199+
if (subFieldSchema.type === 'array' && subFieldSchema.items) {
200+
validateArrayField(validator, subFieldSchema, `${fieldKey}.${subFieldKey}`);
201+
}
202+
203+
validateField(validator, subFieldSchema, `${fieldKey}.${subFieldKey}`, true);
204+
});
205+
}
206+
207+
function validateArrayField(validator: Ajv, fieldSchema: { items?: { type: 'string', properties: Record<string, any> }}, fieldKey: string) {
208+
const arraySchema = (fieldSchema as any).items;
209+
if (!arraySchema) return;
210+
211+
// If the array has object items and have sub-schema defined, we need to validate it.
212+
if (arraySchema.type === 'object' && arraySchema.properties) {
213+
validateSubFields(validator, arraySchema as InputSchemaBaseChecked, `${fieldKey}.items`);
214+
}
215+
216+
// If it's an array of arrays we need, we need to validate the inner array schema.
217+
if (arraySchema.type === 'array' && arraySchema.items) {
218+
validateArrayField(validator, arraySchema, `${fieldKey}.items`);
219+
}
139220
}
140221

141222
/**
142223
* Validates all properties in the input schema
143224
*/
144225
function validateProperties(inputSchema: InputSchemaBaseChecked, validator: Ajv): asserts inputSchema is InputSchema {
145-
Object.entries(inputSchema.properties).forEach(([fieldKey, fieldSchema]) => (
146-
validateField(validator, fieldSchema, fieldKey)),
147-
);
226+
Object.entries(inputSchema.properties).forEach(([fieldKey, fieldSchema]) => {
227+
// The sub-properties has to be validated first, so we got more relevant error messages.
228+
if (fieldSchema.type === 'object' && fieldSchema.properties) {
229+
// If the field has sub-fields, we need to validate them as well.
230+
validateSubFields(validator, fieldSchema as any as InputSchemaBaseChecked, fieldKey);
231+
}
232+
233+
// If the field is an array and has defined schema (items property), we need to validate it differently.
234+
if (fieldSchema.type === 'array' && fieldSchema.items) {
235+
validateArrayField(validator, fieldSchema, fieldKey);
236+
}
237+
238+
validateField(validator, fieldSchema, fieldKey);
239+
});
148240
}
149241

150242
/**
@@ -168,8 +260,14 @@ export function validateExistenceOfRequiredFields(inputSchema: InputSchema) {
168260
* then checks that all required fields are present and finally checks fully against the whole schema.
169261
*
170262
* This way we get the most accurate error message for user.
263+
*
264+
* @param validator An instance of AJV validator. Important: The JSON Schema that the passed input schema is validated against
265+
* is using features from JSON Schema 2019 draft, so the AJV instance must support it.
266+
* @param inputSchema Input schema to validate.
171267
*/
172268
export function validateInputSchema(validator: Ajv, inputSchema: Record<string, unknown>): asserts inputSchema is InputSchema {
269+
ensureAjvSupportsDraft2019(validator);
270+
173271
// First validate just basic structure without fields.
174272
validateBasicStructure(validator, inputSchema);
175273

0 commit comments

Comments
 (0)