[benchmark_inference.py] Specify `tp_size` to `StaticCache` #2784

crcrpar · 2025-12-03T08:50:22Z

What does this PR do?

KV values seem to be intact even when tensor parallel is enabled, thus specify tp_size in StaticCache.

Signed-off-by: Masaki Kozuki <[email protected]>

Copilot

Pull request overview

This PR enhances tensor parallel support in the inference benchmark by specifying the tp_size parameter when initializing StaticCache for transformers >= 4.55. The changes also fix tensor parallel plan patterns to be more specific and add sanity checks to verify proper sharding.

Key Changes

Fixed tensor parallel plan patterns from *.layers.* to model.layers.* for more precise module matching
Added tp_size parameter to StaticCache initialization to properly handle sharded KV heads in tensor parallel configurations
Added DTensor verification assertions for attention projection weights to ensure proper sharding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

shino16

Thanks! I remember I faced the same issue at some older commits, and I was wondering how it could be reproduced.

fix sharding scheme

adf5377

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar requested a review from Copilot December 3, 2025 08:50

crcrpar marked this pull request as ready for review December 3, 2025 08:50

crcrpar requested review from KaelanDt, lantiga and mruberry as code owners December 3, 2025 08:50

Copilot started reviewing on behalf of crcrpar December 3, 2025 08:50 View session

Copilot finished reviewing on behalf of crcrpar December 3, 2025 08:52

Copilot AI reviewed Dec 3, 2025

View reviewed changes

shino16 approved these changes Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark_inference.py] Specify `tp_size` to `StaticCache` #2784

[benchmark_inference.py] Specify `tp_size` to `StaticCache` #2784

Uh oh!

crcrpar commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

shino16 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[benchmark_inference.py] Specify tp_size to StaticCache #2784

Are you sure you want to change the base?

[benchmark_inference.py] Specify tp_size to StaticCache #2784

Uh oh!

Conversation

crcrpar commented Dec 3, 2025

What does this PR do?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

shino16 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[benchmark_inference.py] Specify `tp_size` to `StaticCache` #2784

[benchmark_inference.py] Specify `tp_size` to `StaticCache` #2784