Skip to content

Updating the index-parameter page #10512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 6, 2025

Conversation

ajleong623
Copy link
Contributor

Description

Currently the index parameter states that when a value is not indexed, it is unsearchable, however, based on some recent changes, this is no longer completely true

Issues Resolved

Closes opensearch-project/OpenSearch#18798

Version

List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anthony Leong <[email protected]>
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@ajleong623 ajleong623 marked this pull request as draft July 27, 2025 21:05
Copy link
Member

@sandeshkr419 sandeshkr419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajleong623 for taking an action on this. I have some suggestions on how we can improve this more.

Tagging @kolchfa-aws for more suggestions on this as well.

@@ -10,7 +10,7 @@ has_toc: false

# Index

The `index` mapping parameter controls whether a field is searchable by including it in the inverted index. When set to `true`, the field is indexed and available for queries. When set to `false`, the field is stored in the document but not indexed, making it non-searchable. If you do not need to search a particular field, disabling indexing for that field can reduce index size and improve indexing performance. For example, you can disable indexing on large text fields or metadata that is only used for display.
The `index` mapping parameter controls whether a field is included in the inverted index. When set to `true`, the field is indexed and available for queries. When set to `false`, the field is stored in the document but not indexed, making it non-searchable when [doc_values]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) are not enabled. If you do not need to search a particular field, disabling indexing and doc_values for that field can reduce index size and improve indexing performance. For example, you can disable indexing on large text fields or metadata that is only used for display.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still slightly confused in the wording to explain the difference between index and doc_values.

Here is how an LLM tried to explain me:


In OpenSearch, index and doc_values are two distinct mapping settings that control how data is stored and accessed. Understanding their differences is key to optimizing search performance, aggregations, and storage.

In short, the index setting makes a field searchable, while the doc_values setting makes a field available for sorting, aggregations, and scripting.

Feature index: true (Inverted Index) doc_values: true (Columnar Store)
Primary Use Case 🕵️‍♂️ Fast full-text searching. Finding documents that contain a specific term. 📊 Fast data access for sorting, aggregations (e.g., SUM, AVG), and accessing field values in scripts.
How it Works Creates an inverted index, which maps terms to the documents containing them. Think of it like the index at the back of a book. Creates a columnar data structure that maps documents to the terms they contain. This is efficient for operations that need to scan values across many documents.
Searchability Makes the field searchable. This is the primary mechanism for querying. On its own, it does not make a field searchable in the traditional sense. Queries can run on doc_values if the inverted index is disabled, though it is typically slower.
Performance Excellent for finding specific documents quickly. Significantly improves the performance of sorting and aggregations by avoiding the need to load data into memory from the _source.
Storage Increases index size due to the creation of the inverted index data structure. Also increases index size, but is often more space-efficient than the in-memory fielddata alternative it replaced.
Default Setting true for most fields. true for most fields that support it (e.g., keyword, date, numeric types, but not text).

Configuration Scenarios and Their Implications

Here are the common combinations for these settings and what they mean for your data:

1. index: true and doc_values: true (Default for most fields)

  • Implication: This is the most versatile option. The field is fully searchable, and it can also be efficiently used for sorting, aggregations, and scripting.

  • Use Case: Any field that you need to query directly and also use in aggregations (e.g., a status keyword field you filter on and use for a terms aggregation).

2. index: true and doc_values: false

  • Implication: The field is searchable, but you cannot efficiently sort on it or use it in aggregations. Attempting to do so will force Opensearch to load values into an in-memory structure called fielddata, which can consume a lot of heap space and is generally discouraged.

  • Use Case: A full-text field that is only used for searching and not for sorting or aggregations. The text field type, for instance, has doc_values disabled by default.

3. index: false and doc_values: true

  • Implication: The field is not searchable via the fast inverted index but can still be used for sorting and aggregations. This is a great way to save disk space and improve indexing speed if you only need a field for analytical purposes.

  • Use Case: Fields that you don't need to filter on but need for aggregations, like a transaction_amount for calculating a total sum or an event_duration for an average.

4. index: false and doc_values: false

  • Implication: The field is neither searchable nor available for sorting and aggregations. Its value is only stored within the _source field. You can retrieve it, but you cannot query or aggregate on it.

  • Use Case: Metadata that is purely for display purposes and has no role in search or analytics, such as a descriptive text blob or an image URL that you only show to the user. This configuration offers the greatest savings in index size and indexing performance.


@kolchfa-aws Can use some of your suggestions here on how to get the above information in concise manner at a place (not sure if this is the right page for this). Probably having an example at the end of each combination of index and doc_values (there are 4 combinations in total) might make it more explanatory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I was trying to state was that if either one of the index or doc_values are enabled for the field, we could search on it. The change to use doc_values for search was made due to @harshavamsi . I also noticed that a lot of the field mappers now only short circuit when both index and doc_values are disabled. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java#L496.

However, would it still be helpful to show a table of all the combinations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a table is good because it allows you to find combinations right away. I would put it in a section named "Index and doc values compared" on this page and link to this section from the doc values page (https://docs.opensearch.org/latest/field-types/mapping-parameters/doc-values/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Thanks. +1 on suggestion on linking the table at relevant pages.
@ajleong623 Let's try to get this in as well. Please move it out of draft state when ready to review.

@harshavamsi might need your help also to review the correctness for "Index and doc values compared" which I mentioned in the previous comment.

Signed-off-by: Anthony Leong <[email protected]>
@kolchfa-aws kolchfa-aws added Tech review PR: Tech review in progress backport 3.1 labels Jul 30, 2025
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@ajleong623
Copy link
Contributor Author

The extended description and table should be ready

@ajleong623 ajleong623 marked this pull request as ready for review July 30, 2025 22:37
@ajleong623
Copy link
Contributor Author

@sandeshkr419 @kolchfa-aws Last week I added the table, but I do not think you were able to see the updates as the last comment did not tag anyone. How does the updates look?


By default, all field types are indexed.

## Index and Doc Values

Enabling the index parameter will create an mapping between the terms and document lists. For any subsequent documents, the value of the fields where the index parameter is enabled will be processed into its terms, and foe each of those terms, the document id will be added to the corresponding document list of the term in the mapping. When the `doc_values` parameter is enabled, the document will be mapped to the list of terms is contains for that field. This helps with operations that need to quickly access a value for a document like in sorting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Like doc_values, can you add before and after wherever you are referring toindex` parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, you are referring to the asterisk-like symbol?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajleong623 Use backticks (`) when referring to the parameters, so the parameter name appears in code font. So, instead of index parameter it will be index parameter. My suggestions already incorporate that format.

Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ajleong623! Some rewording suggestions for you.

ajleong623 and others added 3 commits August 4, 2025 15:22
Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
@kolchfa-aws
Copy link
Collaborator

@sandeshkr419 Are more people going to perform a tech review or is this ready for final editorial review/merge?

Copy link
Member

@sandeshkr419 sandeshkr419 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws The changes look good on technical end. Please move to editorial review.

@kolchfa-aws kolchfa-aws added Editorial review PR: Editorial review in progress release-notes PR: Include this PR in the automated release notes v3.2.0 and removed Tech review PR: Tech review in progress backport 3.1 labels Aug 6, 2025
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@natebower natebower removed the Editorial review PR: Editorial review in progress label Aug 6, 2025
@natebower natebower merged commit fa5dcbb into opensearch-project:main Aug 6, 2025
6 checks passed
lucy66hw pushed a commit to lucy66hw/documentation-website that referenced this pull request Aug 14, 2025
* Update index-parameter.md

Signed-off-by: Anthony Leong <[email protected]>

* Update index-parameter.md

Signed-off-by: Anthony Leong <[email protected]>

* Extended description of index and doc values and added table.

Signed-off-by: Anthony Leong <[email protected]>

* add link to table

Signed-off-by: Anthony Leong <[email protected]>

* fix invalid link

Signed-off-by: Anthony Leong <[email protected]>

* Update doc-values.md

Signed-off-by: Anthony Leong <[email protected]>

* Update doc-values.md

Signed-off-by: Anthony Leong <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>

* apply asterisks.

Signed-off-by: Anthony Leong <[email protected]>

* update dead link

Signed-off-by: Anthony Leong <[email protected]>

* Update _field-types/mapping-parameters/doc-values.md

Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Signed-off-by: Nathan Bower <[email protected]>

---------

Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Nathan Bower <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes PR: Include this PR in the automated release notes v3.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Index field is not responsive
4 participants