Fix cuda memory allocation issue caused by fused_linear_act.py #1822

emailweixu · 2025-11-12T05:01:02Z

In the previous implementation, fused_linear_act.StaticState will always allocate a cuda tensor once it is imported. The simple act of allocating a small tensor will cause torch to allocate several hundred MB cuda memory. This can become very bad if there are a lot of subprocesses.

Fix is simple, only create the tensor when it is needed.

In the previous implementation, fused_linear_act.StaticState will always allocate a cuda tensor once it is imported. The simple act of allocating a small tensor will cause torch to allocate several hundred MB cuda memory. This can become very bad if there are a lot of subprocesses. Fix is simple, only create the tensor when it is needed.

emailweixu requested a review from Haichao-Zhang November 12, 2025 05:01

Haichao-Zhang approved these changes Nov 12, 2025

View reviewed changes

emailweixu merged commit f2c844e into pytorch Nov 12, 2025
2 checks passed

emailweixu deleted the PR_fix_fused_linear_act_memory branch November 12, 2025 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix cuda memory allocation issue caused by fused_linear_act.py #1822

Fix cuda memory allocation issue caused by fused_linear_act.py #1822

Uh oh!

emailweixu commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix cuda memory allocation issue caused by fused_linear_act.py #1822

Fix cuda memory allocation issue caused by fused_linear_act.py #1822

Uh oh!

Conversation

emailweixu commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants