-
Notifications
You must be signed in to change notification settings - Fork 174
Description
Is your feature request related to a problem?
Introduction
Currently, we have several AI-powered features in Olly—such as Query Assistant, T2Viz, and AD Suggestion—that rely heavily on understanding the underlying index. However, at present, we typically provide only the index schema and occasionally a single document sample as context to the LLM. Due to prompt length limitations, we cannot include more comprehensive information. Moreover, each feature has its own isolated prompt design, and there is no centralized or purpose-built mechanism for extracting index insights. This results in a fragmented and often insufficient understanding of index structure and semantics across AI features.
Therefore, we want a centralized Index Insight feature to enhance all index-related AI functionalities.
Index insight should be a centralized API to get insight of one index.
To storage index insight, we need customer to provide one container using API since it will store the customer's own data
PUT /_plugins/_ml/index_insight_container
{
"index_name": "abc"
}
The API will create an index with the given name if it doesn't exist. The container information will be put into a system index.
The generated index insight will be the doc inside this container index with the format
{
"last_updated_time": 1754888310420,
"task_type": "STATISTICAL_DATA",
"index_name": "mask_demo_log",
"status": "COMPLETED",
"content": "{\"mapping\":{\"field1\":{\"type\":\"long\"},\"field2\":{\"type\":\"long\"},\"field3\":{\"type\":\"text\"},\"field4\":{\"type\":\"date\"}},\"distribution\":{\"field1\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field3\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field2\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field4\":{}}}"
}
Also, we provide the API to delete the container:
DELETE /_plugins/_ml/index_insight_container
It will delete the container index
The index insight can be get from the API
GET /_plugins/_ml/insights/{index_name}/{task_type}
The output would be
{
"index_insight": {
"index_name": "mask_demo_log",
"content": "{\"mapping\":{\"field1\":{\"type\":\"long\"},\"field2\":{\"type\":\"long\"},\"field3\":{\"type\":\"text\"},\"field4\":{\"type\":\"date\"}},\"distribution\":{\"field1\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field3\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field2\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field4\":{}}}",
"status": "COMPLETED",
"task_type": "STATISTICAL_DATA",
"last_updated_time": 1754888310420
}
}
The task type defined what insight we will generate. Currently, we want to provide the following task type:
- StatisticalDataTask: It will return the mapping and data distribution of one index using DSL.
- FieldDescriptionTask: It will return the description of each column using LLM. It depends on StatisticalDataTask. It mainly use for query assistant and may be used for deep research agent.
- LogRelatedIndexCheckTask: It will Judge whether the index is related to log, Whether there is a column containing the whole log message, Whether there is a column serve as trace id which combine a set of logs into one flow
Each index name + task type is mapping to one doc in the index insight container. When we call this API, we will check whether the doc needs to update. If no, we will directly fetch the doc. Otherwise, we will generate the latest insight and update the doc.
Sometimes the task depends on other prerequisites tasks. So it will generate other tasks firstly to execute.
Also, currently the FieldDescriptionTask
and LogRelatedIndexCheckTask
are required for LLM. To provide LLM usage, we need to config a flow agent in ML config like
POST /_plugins/_ml/agents/_register
{
"name": "Test_Agent_For_RAG_2",
"type": "flow",
"memory": {
"type": "demo"
},
"tools": [
{
"type": "MLModelTool",
"description": "A general tool to answer any question",
"parameters": {
"model_id": "Kidvh5gBYtO-w0f3jpBb"
}
}
]
}
PUT /.plugins-ml-config/_doc/os_index_insight_agent
{
"type": "os_index_insight_agent",
"configuration": {
"agent_id": <agent_id>
}
}
The execution logic would be

Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status