batch : add `pad_equal` [RFC] #15636

ggerganov · 2025-08-28T11:36:17Z

Since we now support better parallelization of the attention via "streams" (see #14363) I was planning to add an alternative approach for computing multi-sequence embeddings. However, I am starting to doubt it would have any benefit compared to the existing method, so opening this PR/discussion for feedback.

Existing approach

Currently we put all tokens from all sequences in a single ubatch and process this with a masked cross-sequence attention in a single stream. For example the ubatch of 4 sequences with different lengths could looks like this:

000000000011122222222222222233333

New approach

The idea that I had, which I believe other implementation also use, is to pad the sequences to an equal length:

# x is a padding token - i.e. it does not attend to nothing and is not attended by any
0000000000xxxxx
111xxxxxxxxxxxx
222222222222222
33333xxxxxxxxxx

We can process this batch with 4 streams in the attention.

Observations

The new approach might be a bit more efficient, though with embeddings we usually have relatively short sequences anyway. So probably the performance would be fine either way
The computation in the non-attention operators (FFN, norms, etc.) will increase because of the extra padding tokens
The logic for passing the llama_encode() input batch would become more complicated for the user because they have to take into account the padding in order to not exceed n_ubatch when it is applied
Note that the reason we cannot use the existing split_equal() approach is because non-causal encoding requires to process all tokens of a sequence in a single ubatch

If you have any thoughts about this let me know. For now, will probably postpone this until I'm more convinced it would be useful.

wip

4317d5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch : add `pad_equal` [RFC] #15636

batch : add `pad_equal` [RFC] #15636

Uh oh!

ggerganov commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

batch : add pad_equal [RFC] #15636

Are you sure you want to change the base?

batch : add pad_equal [RFC] #15636

Uh oh!

Conversation

ggerganov commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Existing approach

New approach

Observations

Uh oh!

Uh oh!

batch : add `pad_equal` [RFC] #15636

batch : add `pad_equal` [RFC] #15636

ggerganov commented Aug 28, 2025 •

edited

Loading