fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

yushi2006 · 2025-07-03T09:38:40Z

What does this PR do?

This PR fixes a bug in the QuestionAnsweringPipeline where it could return fewer than the requested top_k answers when processing long contexts or batched inputs. The original implementation processed each context chunk independently, asking for only the top_k best spans from each chunk before aggregation. This approach was flawed because if the best candidates within a chunk were later invalidated or identified as duplicates of an answer from another chunk, the pipeline had no other options to fall back on, resulting in an insufficient number of final answers. The fix implements a more robust strategy by first over-fetching a much larger pool of candidates from every chunk, then aggregating this global list, and only then sorting, merging, and selecting the final top_k answers, which guarantees a sufficient number of valid candidates to produce a reliable and complete result.

import transformers

architecture = "csarron/mobilebert-uncased-squad-v2"
tokenizer = transformers.AutoTokenizer.from_pretrained(architecture, low_cpu_mem_usage=True)
model = transformers.MobileBertForQuestionAnswering.from_pretrained(
    architecture, low_cpu_mem_usage=True
)
pipeline = transformers.pipeline(task="question-answering", model=model, tokenizer=tokenizer)


data = [
    {'question': ['What color is it?', 'How do the people go?', "What does the 'wolf' howl at?"],
     'context': [
         "Some people said it was green but I know that it's pink.",
         'The people on the bus go up and down. Up and down.',
         "The pack of 'wolves' stood on the cliff and a 'lone wolf' howled at the moon for hours."
     ]}
]

# prediction result is wrong
pipeline(data, top_k=2, max_answer_len=5)

[[{'score': 0.5683297514915466, 'start': 51, 'end': 55, 'answer': 'pink'}, {'score': 0.028800610452890396, 'start': 51, 'end': 56, 'answer': 'pink.'}], [{'score': 0.3008899986743927, 'start': 25, 'end': 36, 'answer': 'up and down'}, {'score': 0.12070021033287048, 'start': 38, 'end': 49, 'answer': 'Up and down'}], [{'score': 0.8356598615646362, 'start': 68, 'end': 76, 'answer': 'the moon'}, {'score': 0.0971309095621109, 'start': 72, 'end': 76, 'answer': 'moon'}]]

Who can review?

@Rocketknight1

fixing the bug

31a1e72

yushi2006 marked this pull request as ready for review July 3, 2025 09:39

github-actions bot requested review from ArthurZucker and Rocketknight1 July 3, 2025 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

Uh oh!

yushi2006 commented Jul 3, 2025

Uh oh!

Uh oh!

fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

Are you sure you want to change the base?

fix(pipelines): QA pipeline returns fewer than top_k results in batch mode #39193

Uh oh!

Conversation

yushi2006 commented Jul 3, 2025

What does this PR do?

Who can review?

Uh oh!

Uh oh!