Skip to content

find_structure API unable to process nested fields from ndjson #127777

@rseldner

Description

@rseldner

Elasticsearch Version

8.16.5

Installed Plugins

No response

Java Version

bundled

OS Version

Elastic Cloud

Problem Description

The find_structure API in elasticsearch (doc) is unable to find fields nested under an object.

This results in limiting the Kibana Data Visualizer file upload functionality

It would be ideal to document as a known issue or limitation in both products for now

{
    "overrides": {
        "explain": "true"
    },
    "results": {
        "num_lines_analyzed": 3,
        "num_messages_analyzed": 3,
-        "sample_start": "{\"host\": {\"id\": \"1\", \"category\": \"NETWORKING DEVICE\"}}\n{\"host\": {\"id\": \"2\", \"category\": \"NETWORKING DEVICE\"}}\n",
        "charset": "UTF-8",
        "has_byte_order_marker": false,
        "format": "ndjson",
        "ecs_compatibility": "disabled",
        "need_client_timezone": false,
-        "mappings": {
-            "properties": {
-                "host": {
-                    "type": "object"
-                }
-            }
-        },
        "explanation": [
            "Using character encoding [UTF-8], which matched the input with [15%] confidence - first [8kB] of input was pure ASCII",
            "Deciding sample is newline delimited NDJSON"
        ]
    }
}

Image

Steps to Reproduce

Submit an ndjson with nested fields

POST _text_structure/find_structure?filter_path=mappings
{"host": {"id": "1", "category": "NETWORKING DEVICE"}}
{"host": {"id": "2", "category": "NETWORKING DEVICE"}}

Only the parent level object is detected

{
  "properties": {
    "host": {
      "type": "object"
    }
  }
}

We have to flatten the structure to work around this:

POST _text_structure/find_structure?filter_path=mappings
{"host.id": 1, "host.category": "NETWORKING DEVICE"}
{"host.id": 2, "host.category": "NETWORKING DEVICE"}
{
  "mappings": {
    "properties": {
      "host.category": {
        "type": "keyword"
      },
      "host.id": {
        "type": "long"
      }
    }
  }
}

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bugTeam:MLMeta label for the ML team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions