Skip to content

Fix cuda memory allocation issue caused by fused_linear_act.py#1822

Merged
emailweixu merged 1 commit intopytorchfrom
PR_fix_fused_linear_act_memory
Nov 12, 2025
Merged

Fix cuda memory allocation issue caused by fused_linear_act.py#1822
emailweixu merged 1 commit intopytorchfrom
PR_fix_fused_linear_act_memory

Conversation

@emailweixu
Copy link
Contributor

In the previous implementation, fused_linear_act.StaticState will always allocate a cuda tensor once it is imported. The simple act of allocating a small tensor will cause torch to allocate several hundred MB cuda memory. This can become very bad if there are a lot of subprocesses.

Fix is simple, only create the tensor when it is needed.

In the previous implementation, fused_linear_act.StaticState will
always allocate a cuda tensor once it is imported. The simple act
of allocating a small tensor will cause torch to allocate several
hundred MB cuda memory. This can become very bad if there are a
lot of subprocesses.

Fix is simple, only create the tensor when it is needed.
@emailweixu emailweixu merged commit f2c844e into pytorch Nov 12, 2025
2 checks passed
@emailweixu emailweixu deleted the PR_fix_fused_linear_act_memory branch November 12, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants