Skip to content

Conversation

@adityavipradas
Copy link

@adityavipradas adityavipradas commented May 30, 2025

Resolving issue #92

  • Added is_training argument in language_model.py and vision_language_model.py to populate KV cache only during inference and set it [None] during training.

added is_training argument to cache KV only during inference. The argument is added in LanguageModel(), LanguageModelBlock(), and LanguageModelGroupedQueryAttention() functions
added is_training argument
@adityavipradas adityavipradas changed the title Feat: adding is_training argument to implement KV cache only during inference Feat: adding is_training argument to implement KV cache only during inference (issue #92) May 30, 2025
@adityavipradas adityavipradas changed the title Feat: adding is_training argument to implement KV cache only during inference (issue #92) Feat: adding is_training argument to implement KV cache only during inference May 30, 2025
@kashif
Copy link
Collaborator

kashif commented May 30, 2025

if it helps, you should note that the nn.Module has a training variable so you can probably use self.training

@adityavipradas
Copy link
Author

Makes sense. But I observed using self.training will result in populating KV cache during validation as well. Instead, I used torch.is_grad_enabled() which is set to False during validation and inference & True during training.

Created a new PR here: #94

@adityavipradas
Copy link
Author

Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants