Skip to content

Conversation

contentis
Copy link
Contributor

cudaMalloc can result in measurable slow-downs when running small models like SD1.5 due to allocation for each operator at every inference.
I was not able to observe any benefit in not using the native allocation backend, and therefore think it should be opt-in option insead.

@comfyanonymous
Copy link
Owner

I'm not sure if it's still needed on recent pytorch but I tried disabling it ~1 year ago and immediately got a lot of complaints about inference slowing down because of increased memory usage spilling into shared mem on windows.

@contentis
Copy link
Contributor Author

Let me try to do further testing on lower-end GPUs to see if I can reproduce. Is there a low-vram option? Maybe it makes sense to combine the option with that flag.

@contentis
Copy link
Contributor Author

Using FLUX and Qwen Image I was not able to measure significant differences in peak and idle memory consumption that are outside of run-to-run variations.

@comfyanonymous
Copy link
Owner

That's what I thought last year too but to confirm I need to do some tests with a big model like flux with loras on windows on a card where it needs to do offloading.

@Ph0rk0z
Copy link

Ph0rk0z commented Aug 17, 2025

hmm.. cudamalloc can crash some workfows that are compiled. please leave a flag to turn it off when necessary.

@contentis
Copy link
Contributor Author

hmm.. cudamalloc can crash some workfows that are compiled. please leave a flag to turn it off when necessary.

This PR intends to disable it by default and requiring a user to manually specify if they want to use cudamalloc backend.

@comfyanonymous
Copy link
Owner

Did some tests on bf16 flux using the basic 1024x1024 flux workflow on windows using a 4090 and disabling cuda malloc does increase memory usage by a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants