vulkan: Allow fallback to sysmem memory when vidmem is full #15649

jeffbolznv · 2025-08-28T23:18:49Z

This lets me run gpt-oss-120b-mxfp4 locally, which is useful for testing even though it's obviously going to be slow. I initially thought about putting this behind an env var, but I'm not sure if the current behavior of failing when vidmem is full is really more useful for the average user. I could go either way.

0cc4m · 2025-08-29T09:24:31Z

For performance tuning it is very bad if the program silently falls back to host memory. We had this issue in the past with the CUDA sysmem fallback setting on Windows, to turn it off was basically always the first optimization step recommended, before tuning layers and that kind of thing.

We also had this issue in RADV, where this behaviour is the default and no way to turn it off existed until someone added an environment variable for this purpose in Mesa.

jeffbolznv · 2025-08-29T15:01:17Z

OK, I added an env var, default behavior remains the same.

netrunnereve · 2025-08-29T15:27:21Z

Yeah please make this hidden behind an envvar, I don't want to submit anyone else to the pain I felt debugging #13765. Setting a smaller number of layers and running the rest on CPU is always going to be faster than swapping.

…#15649) * vulkan: Allow fallback to sysmem memory when vidmem is full * vulkan: Add env var GGML_VK_ALLOW_SYSMEM_FALLBACK

0cc4m · 2025-09-09T14:37:01Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+            buf->device_memory = device->device.allocateMemory({ mem_req.size, memory_type_index });
+            break;
+        } catch (const vk::SystemError& e) {
+            // loop and retry


This hides the actual reason for the allocation failure and always throws No suitable memory type found, which isn't great. Not yet sure how to fix that. We could handle the last attempt separately from the other ones, but that could also hide relevant errors from earlier req_flag attempts.

Fix attempt in #15905

vulkan: Allow fallback to sysmem memory when vidmem is full

992383c

jeffbolznv requested a review from 0cc4m as a code owner August 28, 2025 23:18

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 28, 2025

vulkan: Add env var GGML_VK_ALLOW_SYSMEM_FALLBACK

be2d914

0cc4m approved these changes Aug 31, 2025

View reviewed changes

0cc4m merged commit b97c9ed into ggml-org:master Aug 31, 2025
48 of 49 checks passed

0cc4m reviewed Sep 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Allow fallback to sysmem memory when vidmem is full #15649

vulkan: Allow fallback to sysmem memory when vidmem is full #15649

Uh oh!

jeffbolznv commented Aug 28, 2025

Uh oh!

0cc4m commented Aug 29, 2025

Uh oh!

jeffbolznv commented Aug 29, 2025

Uh oh!

netrunnereve commented Aug 29, 2025

Uh oh!

Uh oh!

0cc4m Sep 9, 2025

Uh oh!

0cc4m Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vulkan: Allow fallback to sysmem memory when vidmem is full #15649

vulkan: Allow fallback to sysmem memory when vidmem is full #15649

Uh oh!

Conversation

jeffbolznv commented Aug 28, 2025

Uh oh!

0cc4m commented Aug 29, 2025

Uh oh!

jeffbolznv commented Aug 29, 2025

Uh oh!

netrunnereve commented Aug 29, 2025

Uh oh!

Uh oh!

0cc4m Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants