Skip to content

How to report additional info in bulk API? #236

@enikao

Description

@enikao

In #67, we decided not to allow additional fields in serialization.
This means that any additional data we want to transmit needs to be around a chunk, like:

{
  "chunk": {
    "serializationFormatVersion": "2023.1",
    "languages": [ ... ],
    "nodes": [ ... ]
  },
  "other": "data",
  "more": true
}

We probably want to report additional data in bulk API. Taking retrieve as example, messages might include "you've asked for a unknown node id" or "invalid depthLimit".
For createPartitions, we might send messages like "Partition node id already exists" or "Partition node id not reserved for this client".

How do we want to encode such messages?

Option A: Just text messages

We just send plain text messages without any additional structure.

{
  "chunk": { ... },
  "messages": [
    "requested node id unknown: 2134",
    "node 123 mentions child abc, but node abc mentions parent 456",
    "node 23 contains property {myLang@23|visible} several times (values: 'hello', 'true')"
  ],
  "resultCode": 123
}

Pro:

  • Very simple
  • Flexible
  • Such messages usually stem from an implementation bug, so it's only useful for programmers.
    They understand text; automatic processing is not necessary.

Con:

  • Hard to process automatically
  • How to distinguish "severity"? Asking for an unknown node id is ok, but trying to create a partition with an existing node id is a real issue.

Option B: Generic structured messages

The wire format has a very generic structure, and clients need to make sense of it.

{
  "chunk": { ... },
  "success": true,
  "messages": [
    {
      "kind": "unknownNodeId",
      "message": "requested node id unknown: 2134",
      "data": [
        "2134"
      ]
    },
    {
      "kind": "invalidTree",
      "message": "node 123 mentions child abc, but node abc mentions parent 456",
      "data": {
        "parentId": "123",
        "childId": "abc",
        "child-parentId": "456"
      }
    },
    {
      "kind": "duplicateProperty",
      "message": "node 23 contains property {myLang@23|visible} several times (values: 'hello', 'true')",
      "data": [
        "23",
        "myLang",
        "23",
        "visible",
        "hello",
        "true"
      ]
    }
  ]
}

Pro:

  • Wire format not too complex, can be verified generically
  • Seems to work for EMF
  • Allows automatic processing, somewhat independent from implementation

Con:

  • Need to know each kind to make sense of data

Option C: Specifically structured messages

The wire format specifies every possible message in detail with appropriate stucture

{
  "chunk": { ... },
  "messages": [
    {
      "kind": "unknownNodeId",
      "message": "requested node id unknown: 2134",
      "nodeId": "2134"
    },
    {
      "kind": "invalidTree",
      "message": "node 123 mentions child abc, but node abc mentions parent 456",
      "parent-nodeId": "123",
      "parent-childId": "abc",
      "child-parentId": "456"
    },
    {
      "kind": "duplicateProperty",
      "message": "node 23 contains property {myLang@23|visible} several times (values: 'hello', 'true')",
      "nodeId": "23",
      "metaPointer": {
        "language": "myLang",
        "version": "23",
        "key": "visible"
      },
      "values": [
        "hello",
        "true"
      ]
    }
  ]
}

Pro:

  • Easy to interpret
  • Little ambiguity

Con:

  • Verbose
  • Inflates protocol (as we have to specify each message)

Option D: Use validation findings

Send a second batch of nodes containing validation finding nodes

Note: This chunk omits the nodes hello-node-id and true-node-id for brevity.

{
  "chunk": { ... },
  "findings": {
    "serializationFormatVersion": "2023.1",
    "languages": [ ... ],
    "nodes": [
      {
        "id": "aaa",
        "classifier": {
          "language": "bulkApiLanguage",
          "version": "2024.1",
          "key": "unknownNodeId"
        },
        "properties": [
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "message"
            },
            "value": "requested node id unknown: 2134"
          },
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "nodeId"
            },
            "value": "2134"
          }
        ],
        "containments": [],
        "references": [],
        "annotations": [],
        "parent": "someId"
      },
      {
        "id": "bbb",
        "classifier": {
          "language": "bulkApiLanguage",
          "version": "2024.1",
          "key": "invalidTree"
        },
        "properties": [
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "message"
            },
            "value": "node 123 mentions child abc, but node abc mentions parent 456"
          },
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "parent-nodeId"
            },
            "value": "123"
          },
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "parent-childId"
            },
            "value": "abc"
          },
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "child-parentId"
            },
            "value": "456"
          }
        ],
        "containments": [],
        "references": [],
        "annotations": [],
        "parent": "someId"
      },
      {
        "id": "ccc",
        "classifier": {
          "language": "bulkApiLanguage",
          "version": "2024.1",
          "key": "duplicateProperty"
        },
        "properties": [
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "message"
            },
            "value": "node 23 contains property {myLang@23|visible} several times (values: 'hello', 'true')"
          },
          {
            "property": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "nodeId"
            },
            "value": "23"
          }
        ],
        "containments": [
          {
            "containment": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "duplicateValues"
            },
            "children": [
              "hello-node-id",
              "true-node-id"
            ]
          }
        ],
        "references": [
        {
            "reference": {
              "language": "bulkApiLanguage",
              "version": "2024.1",
              "key": "feature"
            },
            "targets": [
              {
                "resolveInfo": "visible",
                "reference": "id-visible"
              }
            ]
          },
        ],
        "annotations": [],
        "parent": "someId"
      }
    ]
  }
}

Pro:

Con:

  • very verbose
  • we probably don't need lots of node features (e.g. unique ids, tree shape, annotations)
  • Danger of XML (only because we can express everything in our format it doesn't mean we should)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions