-
Notifications
You must be signed in to change notification settings - Fork 291
Add left pad for CausalLMPreprocessor #2343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @pass-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces the capability for left padding within the CausalLMPreprocessor
by adding a padding_side
argument and implementing the necessary logic. This change is a foundational step towards enabling left padding in CausalLM
models, providing greater flexibility for sequence processing, especially in generation tasks where left padding is often preferred.
Highlights
- CausalLMPreprocessor now supports
padding_side
: TheCausalLMPreprocessor
class has been enhanced to accept apadding_side
argument (either 'left' or 'right', defaulting to 'right') in its constructor and itscall
andgenerate_preprocess
methods. This allows users to control the direction of padding for sequences. - Left padding implementation for
CausalLMPreprocessor
: A new logic path has been introduced withinCausalLMPreprocessor
'scall
andgenerate_preprocess
methods to handle left padding. This involves a two-step padding process and leveragestf.xla.dynamic_update_slice
for precise manipulation of padding masks. Left padding functionality will raise anImportError
if a TensorFlow installation with XLA is not available. - Padding utility layers updated: The
call
methods inMultiSegmentPacker
andStartEndPacker
(which are used internally byCausalLMPreprocessor
) have been modified to accept and utilize apadding_side
argument. This makes these lower-level padding utilities more flexible and enables the new left padding behavior. - Comprehensive test coverage for padding: New unit tests have been added to
causal_lm_preprocessor_test.py
to thoroughly validate the behavior ofCausalLMPreprocessor
with both left and right padding for its standard preprocessing (call
) and generation (generate_preprocess
) functionalities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces left-padding functionality to the CausalLMPreprocessor
. The changes are logically sound, with the new padding_side
option correctly propagated. The inclusion of tests is also commendable. My feedback focuses on improving the maintainability and performance of the new implementation by suggesting refactoring duplicated code into a helper method and using a more efficient API for tensor shape retrieval.
if padding_side == "left": | ||
addition_token_num = int(self.add_start_token + self.add_end_token) | ||
token_ids, padding_mask = self.packer( | ||
x, | ||
sequence_length=x.to_tensor().shape[-1] + addition_token_num, | ||
add_start_value=self.add_start_token, | ||
add_end_value=self.add_end_token, | ||
padding_side=padding_side, | ||
) | ||
token_ids, all_padding_mask = self.packer( | ||
token_ids, | ||
sequence_length=sequence_length + 1, | ||
add_start_value=False, | ||
add_end_value=False, | ||
padding_side="right", | ||
) | ||
if dynamic_update_slice is None: | ||
raise ImportError( | ||
"Left padding on CausalLMPreprocessor requires a TensorFlow " | ||
"installation with XLA available." | ||
) | ||
padding_mask = dynamic_update_slice( | ||
all_padding_mask, padding_mask, [0] * len(padding_mask.shape) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addition_token_num = int(self.add_start_token + self.add_end_token) | ||
token_ids, padding_mask = self.packer( | ||
x, | ||
sequence_length=x.to_tensor().shape[-1] + addition_token_num, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using x.to_tensor().shape[-1]
can be inefficient because it creates a dense tensor from a ragged one just to get its shape. A more performant way to get the length of the longest sequence in a tf.RaggedTensor
is to use x.bounding_shape()[-1]
. This avoids the overhead of creating the intermediate dense tensor. This optimization also applies to line 177 in generate_preprocess
.
sequence_length=x.to_tensor().shape[-1] + addition_token_num, | |
sequence_length=x.bounding_shape()[-1] + addition_token_num, |
I think we've chatted before on this, but the hard part is the modeling and generation side here, not the preprocessing. Can we start with a prototype of everything working for one model, figure out our usage, then start landing? My main worry is a good plan modeling side. How are we going to land positional embedding changes to accommodate this? What does the actual high level UX look like with generate? Not fully fleshed out, but if I was going to try to land this with a good UX...
The hard stuff technically will be the positional inputs, sampling changes, and compiled generation changes, while keeping things relatively compatible. The preprocessing is fairly easy. I think a prototype with a model (fine to open a draft PR), say for Gemma 2 or Llama 3, would help us validate these harder areas first, before we start landing this. I'd avoid Gemma 3 for starters, it's extra hard with the need for bidirectional attention in image inputs. |
@pass-lin fine to start with a end to end prototype for llama or qwen, and you are right ROPE in both these models is position invariant. It's not that this PR is too simple, it's that I think we need to peer ahead at where we are going a bit. Qwen or llama would allow us to figure out the user facing user experience and the changes to the compile generate that will be necessary. We do ultimately want a plan for models with absolute position embeddings (e.g. GPT2), so maybe worth thinking about that a bit, but fine to start on whatever model. |
We need to gradually add the left pad function to casualLM.
First step: #2242
Now, the second step is to add a left pad to the Causal LM Preprocessor.
Next, we will add a left pad to casualLM, and the gentlemen present will surely be amazed by it.