Skip to content

fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yushi2006
Copy link
Contributor

What does this PR do?

Fixes #38984

This PR fixes a bug in the QuestionAnsweringPipeline where it could return fewer than the requested top_k answers when processing long contexts or batched inputs. The original implementation processed each context chunk independently, asking for only the top_k best spans from each chunk before aggregation. This approach was flawed because if the best candidates within a chunk were later invalidated or identified as duplicates of an answer from another chunk, the pipeline had no other options to fall back on, resulting in an insufficient number of final answers. The fix implements a more robust strategy by first over-fetching a much larger pool of candidates from every chunk, then aggregating this global list, and only then sorting, merging, and selecting the final top_k answers, which guarantees a sufficient number of valid candidates to produce a reliable and complete result.

import transformers

architecture = "csarron/mobilebert-uncased-squad-v2"
tokenizer = transformers.AutoTokenizer.from_pretrained(architecture, low_cpu_mem_usage=True)
model = transformers.MobileBertForQuestionAnswering.from_pretrained(
    architecture, low_cpu_mem_usage=True
)
pipeline = transformers.pipeline(task="question-answering", model=model, tokenizer=tokenizer)


data = [
    {'question': ['What color is it?', 'How do the people go?', "What does the 'wolf' howl at?"],
     'context': [
         "Some people said it was green but I know that it's pink.",
         'The people on the bus go up and down. Up and down.',
         "The pack of 'wolves' stood on the cliff and a 'lone wolf' howled at the moon for hours."
     ]}
]

# prediction result is wrong
pipeline(data, top_k=2, max_answer_len=5)
[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], [{'score': 0.3008899986743927, 'start': 25, 'end': 36, 'answer': 'up and down'}, {'score': 0.12070021033287048, 'start': 38, 'end': 49, 'answer': 'Up and down'}], [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]

Who can review?

@Rocketknight1

@yushi2006 yushi2006 marked this pull request as ready for review July 3, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QA pipeline prediction generates wrong response when top_k param > 1
1 participant