-
Notifications
You must be signed in to change notification settings - Fork 597
Updating the index-parameter page #10512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Anthony Leong <[email protected]>
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ajleong623 for taking an action on this. I have some suggestions on how we can improve this more.
Tagging @kolchfa-aws for more suggestions on this as well.
@@ -10,7 +10,7 @@ has_toc: false | |||
|
|||
# Index | |||
|
|||
The `index` mapping parameter controls whether a field is searchable by including it in the inverted index. When set to `true`, the field is indexed and available for queries. When set to `false`, the field is stored in the document but not indexed, making it non-searchable. If you do not need to search a particular field, disabling indexing for that field can reduce index size and improve indexing performance. For example, you can disable indexing on large text fields or metadata that is only used for display. | |||
The `index` mapping parameter controls whether a field is included in the inverted index. When set to `true`, the field is indexed and available for queries. When set to `false`, the field is stored in the document but not indexed, making it non-searchable when [doc_values]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) are not enabled. If you do not need to search a particular field, disabling indexing and doc_values for that field can reduce index size and improve indexing performance. For example, you can disable indexing on large text fields or metadata that is only used for display. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still slightly confused in the wording to explain the difference between index
and doc_values
.
Here is how an LLM tried to explain me:
In OpenSearch, index
and doc_values
are two distinct mapping settings that control how data is stored and accessed. Understanding their differences is key to optimizing search performance, aggregations, and storage.
In short, the index
setting makes a field searchable, while the doc_values
setting makes a field available for sorting, aggregations, and scripting.
Feature | index: true (Inverted Index) |
doc_values: true (Columnar Store) |
---|---|---|
Primary Use Case | 🕵️♂️ Fast full-text searching. Finding documents that contain a specific term. | 📊 Fast data access for sorting, aggregations (e.g., SUM , AVG ), and accessing field values in scripts. |
How it Works | Creates an inverted index, which maps terms to the documents containing them. Think of it like the index at the back of a book. | Creates a columnar data structure that maps documents to the terms they contain. This is efficient for operations that need to scan values across many documents. |
Searchability | Makes the field searchable. This is the primary mechanism for querying. | On its own, it does not make a field searchable in the traditional sense. Queries can run on doc_values if the inverted index is disabled, though it is typically slower. |
Performance | Excellent for finding specific documents quickly. | Significantly improves the performance of sorting and aggregations by avoiding the need to load data into memory from the _source . |
Storage | Increases index size due to the creation of the inverted index data structure. | Also increases index size, but is often more space-efficient than the in-memory fielddata alternative it replaced. |
Default Setting | true for most fields. |
true for most fields that support it (e.g., keyword , date , numeric types, but not text ). |
Configuration Scenarios and Their Implications
Here are the common combinations for these settings and what they mean for your data:
1. index: true
and doc_values: true
(Default for most fields)
Implication: This is the most versatile option. The field is fully searchable, and it can also be efficiently used for sorting, aggregations, and scripting.
Use Case: Any field that you need to query directly and also use in aggregations (e.g., a
status
keyword field you filter on and use for a terms aggregation).
2. index: true
and doc_values: false
Implication: The field is searchable, but you cannot efficiently sort on it or use it in aggregations. Attempting to do so will force Opensearch to load values into an in-memory structure called
fielddata
, which can consume a lot of heap space and is generally discouraged.Use Case: A full-text field that is only used for searching and not for sorting or aggregations. The
text
field type, for instance, hasdoc_values
disabled by default.
3. index: false
and doc_values: true
Implication: The field is not searchable via the fast inverted index but can still be used for sorting and aggregations. This is a great way to save disk space and improve indexing speed if you only need a field for analytical purposes.
Use Case: Fields that you don't need to filter on but need for aggregations, like a
transaction_amount
for calculating a total sum or anevent_duration
for an average.
4. index: false
and doc_values: false
Implication: The field is neither searchable nor available for sorting and aggregations. Its value is only stored within the
_source
field. You can retrieve it, but you cannot query or aggregate on it.Use Case: Metadata that is purely for display purposes and has no role in search or analytics, such as a descriptive text blob or an image URL that you only show to the user. This configuration offers the greatest savings in index size and indexing performance.
@kolchfa-aws Can use some of your suggestions here on how to get the above information in concise manner at a place (not sure if this is the right page for this). Probably having an example at the end of each combination of index
and doc_values
(there are 4 combinations in total) might make it more explanatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I was trying to state was that if either one of the index or doc_values
are enabled for the field, we could search on it. The change to use doc_values
for search was made due to @harshavamsi . I also noticed that a lot of the field mappers now only short circuit when both index and doc_values
are disabled. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java#L496.
However, would it still be helpful to show a table of all the combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a table is good because it allows you to find combinations right away. I would put it in a section named "Index and doc values compared" on this page and link to this section from the doc values page (https://docs.opensearch.org/latest/field-types/mapping-parameters/doc-values/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Thanks. +1 on suggestion on linking the table at relevant pages.
@ajleong623 Let's try to get this in as well. Please move it out of draft state when ready to review.
@harshavamsi might need your help also to review the correctness for "Index and doc values compared" which I mentioned in the previous comment.
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
The extended description and table should be ready |
@sandeshkr419 @kolchfa-aws Last week I added the table, but I do not think you were able to see the updates as the last comment did not tag anyone. How does the updates look? |
|
||
By default, all field types are indexed. | ||
|
||
## Index and Doc Values | ||
|
||
Enabling the index parameter will create an mapping between the terms and document lists. For any subsequent documents, the value of the fields where the index parameter is enabled will be processed into its terms, and foe each of those terms, the document id will be added to the corresponding document list of the term in the mapping. When the `doc_values` parameter is enabled, the document will be mapped to the list of terms is contains for that field. This helps with operations that need to quickly access a value for a document like in sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Like doc_values
, can you add before and after wherever you are referring to
index` parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, you are referring to the asterisk-like symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajleong623 Use backticks (`
) when referring to the parameters, so the parameter name appears in code font. So, instead of index parameter it will be index
parameter. My suggestions already incorporate that format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @ajleong623! Some rewording suggestions for you.
Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: Anthony Leong <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@sandeshkr419 Are more people going to perform a tech review or is this ready for final editorial review/merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws The changes look good on technical end. Please move to editorial review.
Signed-off-by: Nathan Bower <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Update index-parameter.md Signed-off-by: Anthony Leong <[email protected]> * Update index-parameter.md Signed-off-by: Anthony Leong <[email protected]> * Extended description of index and doc values and added table. Signed-off-by: Anthony Leong <[email protected]> * add link to table Signed-off-by: Anthony Leong <[email protected]> * fix invalid link Signed-off-by: Anthony Leong <[email protected]> * Update doc-values.md Signed-off-by: Anthony Leong <[email protected]> * Update doc-values.md Signed-off-by: Anthony Leong <[email protected]> * Apply suggestions from code review Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Anthony Leong <[email protected]> * apply asterisks. Signed-off-by: Anthony Leong <[email protected]> * update dead link Signed-off-by: Anthony Leong <[email protected]> * Update _field-types/mapping-parameters/doc-values.md Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Signed-off-by: Nathan Bower <[email protected]> --------- Signed-off-by: Anthony Leong <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Signed-off-by: Nathan Bower <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Description
Currently the index parameter states that when a value is not indexed, it is unsearchable, however, based on some recent changes, this is no longer completely true
Issues Resolved
Closes opensearch-project/OpenSearch#18798
Version
List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.
Frontend features
If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.