Feat: adding is_training argument to implement KV cache only during inference #93

adityavipradas · 2025-05-30T16:27:45Z

Resolving issue #92

Added is_training argument in language_model.py and vision_language_model.py to populate KV cache only during inference and set it [None] during training.

added is_training argument to cache KV only during inference. The argument is added in LanguageModel(), LanguageModelBlock(), and LanguageModelGroupedQueryAttention() functions

added is_training argument

kashif · 2025-05-30T19:45:04Z

if it helps, you should note that the nn.Module has a training variable so you can probably use self.training

adityavipradas · 2025-05-31T09:42:35Z

Makes sense. But I observed using self.training will result in populating KV cache during validation as well. Instead, I used torch.is_grad_enabled() which is set to False during validation and inference & True during training.

Created a new PR here: #94

adityavipradas · 2025-05-31T09:43:01Z

Closing this PR.

adityavipradas added 2 commits May 30, 2025 20:10

Update language_model.py

98ccfc0

added is_training argument to cache KV only during inference. The argument is added in LanguageModel(), LanguageModelBlock(), and LanguageModelGroupedQueryAttention() functions

Update vision_language_model.py

2357d46

added is_training argument

adityavipradas changed the title ~~Feat: adding is_training argument to implement KV cache only during inference~~ Feat: adding is_training argument to implement KV cache only during inference (issue #92) May 30, 2025

adityavipradas changed the title ~~Feat: adding is_training argument to implement KV cache only during inference (issue #92)~~ Feat: adding is_training argument to implement KV cache only during inference May 30, 2025

adityavipradas closed this May 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: adding is_training argument to implement KV cache only during inference #93

Feat: adding is_training argument to implement KV cache only during inference #93

Uh oh!

adityavipradas commented May 30, 2025 •

edited

Loading

Uh oh!

kashif commented May 30, 2025

Uh oh!

adityavipradas commented May 31, 2025

Uh oh!

adityavipradas commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat: adding is_training argument to implement KV cache only during inference #93

Feat: adding is_training argument to implement KV cache only during inference #93

Uh oh!

Conversation

adityavipradas commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kashif commented May 30, 2025

Uh oh!

adityavipradas commented May 31, 2025

Uh oh!

adityavipradas commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adityavipradas commented May 30, 2025 •

edited

Loading