-
Notifications
You must be signed in to change notification settings - Fork 2.6k
LUCENE-9476 Add getBulkPath API for the Taxonomy index #2247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gautamworah96
Once we iterate to a solid PR I am very curious how this helps facets performance -- we can switch luceneutil over to this bulk API to test.
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
...e/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slowly getting back on the horse here 😄 , so this review focuses mainly on style ..
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
...e/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gautamworah96 -- looking closer!
| } | ||
|
|
||
| private FacetLabel getPathFromCache(int ordinal) { | ||
| // TODO: can we use an int-based hash impl, such as IntToObjectMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh that is a great idea, and low-hanging fruit, and would greatly reduce the RAM usage for this cache.
I think DirectoryTaxonomyWriter also has such a cache that we could change to a native map.
Could you open a spinoff issue?
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Outdated
Show resolved
Hide resolved
...e/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java
Show resolved
Hide resolved
…. Use parallel sort to fix duplicate ordinal bug. Add a test case for it. Minor fixes
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gautamworah96, looks close!
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
Show resolved
Hide resolved
| // this check is only needed once to confirm that the index uses BinaryDocValues | ||
| boolean success = values.advanceExact(ordinals[i] - leafReaderDocBase); | ||
| if (success == false) { | ||
| return getBulkPathForOlderIndexes(ordinals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm confused -- wouldn't an older index have no BinaryDocValues field? So, values would be null, and we should fallback then?
This code should hit NullPointerException on an old index I think? How come our backwards compatibility test didn't expose this?
|
|
||
| for (int i = 0; i < ordinalsLength; i++) { | ||
| synchronized (categoryCache) { | ||
| categoryCache.put(ordinals[i], bulkPath[originalPosition[i]]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will sometimes put ordinals back into the cache that were already there at the start of this method right? I guess that's harmless. Or, maybe we should move this up above? Then we can do it only for those ordinals that were not already cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think intuitively adding the ordinals back into the cache would not be a problem. This should also (theoretically) be faster than trying to get the lock again and again in a loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also (theoretically) be faster than trying to get the lock again and again in a loop?
Hmm, I'm confused: this code is already getting the lock inside a for loop? I guess we could move the synchronized outside of the for loop? Or, maybe javac is doing this for us already? But let's make it explicit, or, let's just merge this for loop with the one before (and keep acquiring the lock inside the for loop)? One big benefit of the latter approach is that if all of the ordinals were already cached (hopefully typically a common case), we do not need any locking, but with this approach, we still do.
...e/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java
Show resolved
Hide resolved
Today both Surprisingly, |
Description
In LUCENE-9450 we switched the Taxonomy index from Stored Fields to
BinaryDocValues.In the resulting implementation of thegetPathcode, we create a newBinaryDocValues's values instance for each ordinal.It may happen that we may traverse over the same nodes over and over again if the
getPathAPI is called multiple times for ordinals in the same segment/with the samereaderIndex.This PR takes advantage of that fact by sorting ordinals and then trying to find out if some of the ordinals are present in the same segment/have the same
readerIndex(by trying toadvanceExactto the correct position and not failing) thereby allowing us to reuse the previousBinaryDocValuesobject.Solution
Steps:
advanceExactto the correct position with the previously calculatedreaderIndex. If the operation fails, try to find the correct segment for the ordinal and thenadvanceExactto the desired position.Tests
Added a new test for the API that compares the individual
getPathresults from ordinals with the bulk FacetLabels returned by thegetBulkPathAPIChecklist
Please review the following and check all that apply:
masterbranch../gradlew check.