[FEATURE] RFC: Index insight: A feature to enhance indices related AI features

**Is your feature request related to a problem?**
Introduction
Currently, we have several AI-powered features in Olly—such as Query Assistant, T2Viz, and AD Suggestion—that rely heavily on understanding the underlying index. However, at present, we typically provide only the index schema and occasionally a single document sample as context to the LLM. Due to prompt length limitations, we cannot include more comprehensive information. Moreover, each feature has its own isolated prompt design, and there is no centralized or purpose-built mechanism for extracting index insights. This results in a fragmented and often insufficient understanding of index structure and semantics across AI features.
Therefore, we want a centralized **Index Insight** feature to enhance all index-related AI functionalities.

Index insight should be a centralized API to get insight of one index. 
To storage index insight, we need customer to provide one container using API since it will store the customer's own data
```
PUT /_plugins/_ml/index_insight_container
{
    "index_name": "abc"
}
```
The API will create an index with the given name if it doesn't exist. The container information will be put into a system index.
The generated index insight will be the doc inside this container index with the format
```
{
        "last_updated_time": 1754888310420,
        "task_type": "STATISTICAL_DATA",
        "index_name": "mask_demo_log",
        "status": "COMPLETED",
        "content": "{\"mapping\":{\"field1\":{\"type\":\"long\"},\"field2\":{\"type\":\"long\"},\"field3\":{\"type\":\"text\"},\"field4\":{\"type\":\"date\"}},\"distribution\":{\"field1\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field3\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field2\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field4\":{}}}"
}
```

Also, we provide the API to delete the container:
```
DELETE /_plugins/_ml/index_insight_container
```
It will delete the container index

The index insight can be get from the API
```
GET /_plugins/_ml/insights/{index_name}/{task_type}
```
The output would be 
```
{
    "index_insight": {
        "index_name": "mask_demo_log",
        "content": "{\"mapping\":{\"field1\":{\"type\":\"long\"},\"field2\":{\"type\":\"long\"},\"field3\":{\"type\":\"text\"},\"field4\":{\"type\":\"date\"}},\"distribution\":{\"field1\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field3\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field2\":{\"unique_count\":0.0,\"unique_terms\":[]},\"field4\":{}}}",
        "status": "COMPLETED",
        "task_type": "STATISTICAL_DATA",
        "last_updated_time": 1754888310420
    }
}
```

The task type defined what insight we will generate. Currently, we want to  provide the following task type:
1. StatisticalDataTask: It will return the mapping and data distribution of one index using DSL.
2. FieldDescriptionTask: It will return the description of each column using LLM. It depends on StatisticalDataTask. It mainly use for query assistant and may be used for deep research agent.
3. LogRelatedIndexCheckTask: It will Judge whether the index is related to log, Whether there is a column containing the whole log message, Whether there is a column serve as trace id which combine a set of logs into one flow

Each index name + task type is mapping to one doc in the index insight container. When we call this API, we will check whether the doc needs to update. If no, we will directly fetch the doc. Otherwise, we will generate the latest insight and update the doc. 

Sometimes the task depends on other prerequisites tasks. So it will generate other tasks firstly to execute.
Also, currently the `FieldDescriptionTask` and `LogRelatedIndexCheckTask` are required for LLM. To provide LLM usage, we need to config a flow agent in ML config like 
```
POST /_plugins/_ml/agents/_register
{
  "name": "Test_Agent_For_RAG_2",
  "type": "flow",
  "memory": {
    "type": "demo"
  },
  "tools": [
    {
      "type": "MLModelTool",
      "description": "A general tool to answer any question",
      "parameters": {
        "model_id": "Kidvh5gBYtO-w0f3jpBb"
      }
    }
  ]
}
PUT /.plugins-ml-config/_doc/os_index_insight_agent
{
    "type": "os_index_insight_agent",
    "configuration": {
    "agent_id": <agent_id>
    }
}
```

The execution logic would be 

<img width="1924" height="764" alt="Image" src="https://github.com/user-attachments/assets/d9435d37-9593-4cba-9672-345f509fe41b" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] RFC: Index insight: A feature to enhance indices related AI features #3993

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] RFC: Index insight: A feature to enhance indices related AI features #3993

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions