Skip to content

Conversation

@kdt523
Copy link

@kdt523 kdt523 commented Oct 21, 2025

Description
Summary
This fixes a bug where MaxScoreBulkScorer could set its inner/outer window upper bounds past the current leaf maxDoc, allowing TermScorer to be called with docIDs >= maxDoc and causing EOF or ArrayIndex errors when accessing norms.

Motivation
Under certain disjunction + filter workloads the bulk scorer's windowing logic could loop with upper bounds computed without clamping to the leaf's maxDoc. This could result in TermScorer attempting to read norms past the leaf end (NO_MORE_DOCS), leading to intermittent EOF/ArrayIndex errors in production and tests.

What changed

lucene/core/src/java/org/apache/lucene/search/MaxScoreBulkScorer.java
Clamp outer and inner window upper bounds to the leaf maxDoc to ensure scoring loops never iterate past the leaf boundary.
lucene/core/tests/src/test/org/apache/lucene/search/TestMaxScoreBulkScorerFilterBounds.java
Add regression test exercising the disjunction + restrictive filter code path that previously triggered the failure.
lucene/CHANGES.txt
Add an entry referencing this change (GITHUB#15324) and the contributor.
How to test

Unit tests:
The newly-added test in lucene/core/tests reproduces the scenario; run it with:
./gradlew :lucene:core:test --tests org.apache.lucene.search.TestMaxScoreBulkScorerFilterBounds
Manual validation:
Build core and run the relevant test suite:
./gradlew :lucene:core:compileJava :lucene:core:test
Note: Some generate/format tasks may require python3 available on PATH; if you see checksum mismatches, run:
./gradlew :lucene:core:generateForUtil --no-daemon --info
after ensuring python3 is callable by Gradle.
CHANGES entry

GITHUB#15324: Fix MaxScoreBulkScorer could call TermScorer with docID >= maxDoc, causing EOFException on norms access (contributor: kdt523)
Checklist (from CONTRIBUTING.md)

[x] My PR title is short and descriptive.
[x] I have included a summary of the change and why it was needed.
[x] I added tests that reproduce the issue and prove the fix.
[x] I updated CHANGES.txt with an entry referencing the PR/issue.
[x] I ran ./gradlew tidy and applied code formatting.
I ran ./gradlew check locally and all checks pass.
If the change affects generated sources, I ran the generator tasks and included updated files.
task done are marked with [x]
fix: #15324

@github-actions github-actions bot added this to the 11.0.0 milestone Oct 21, 2025
Comment on lines 202 to 207

* GITHUB#15343: Ensure that `AcceptDocs#cost()` only ever calls `BitSets#cardinality()`
once per instance to avoid redundant computation. (Ben Trent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, seems like a bad delete?

@benwtrent
Copy link
Member

@kdt523 did you verify your test failed and that the code change actually addresses the issue? I ran your new test against Lucene main and it always passed.

We should have a repeatable test that reproduces the bad behavior to confirm the fix.

public class TestMaxScoreBulkScorerFilterBounds extends LuceneTestCase {

public void testFilteredDisjunctionDoesNotScorePastMaxDoc() throws Exception {
Directory dir = new RAMDirectory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you actually run this? RAMDirectory hasn't existed since Lucene 9....

@msokolov
Copy link
Contributor

This has that AI smell to it

@benwtrent
Copy link
Member

FYI, this isn't the cause of the bug, we found the real cause

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strange EOF in 10.1.0 with MaxScoreBulkScorer#scoreInnerWindowWithFilter

3 participants