Skip to content

Conversation

jeffbolznv
Copy link
Collaborator

This lets me run gpt-oss-120b-mxfp4 locally, which is useful for testing even though it's obviously going to be slow. I initially thought about putting this behind an env var, but I'm not sure if the current behavior of failing when vidmem is full is really more useful for the average user. I could go either way.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner August 28, 2025 23:18
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 28, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Aug 29, 2025

For performance tuning it is very bad if the program silently falls back to host memory. We had this issue in the past with the CUDA sysmem fallback setting on Windows, to turn it off was basically always the first optimization step recommended, before tuning layers and that kind of thing.

We also had this issue in RADV, where this behaviour is the default and no way to turn it off existed until someone added an environment variable for this purpose in Mesa.

@jeffbolznv
Copy link
Collaborator Author

OK, I added an env var, default behavior remains the same.

@netrunnereve
Copy link
Collaborator

Yeah please make this hidden behind an envvar, I don't want to submit anyone else to the pain I felt debugging #13765. Setting a smaller number of layers and running the rest on CPU is always going to be faster than swapping.

@0cc4m 0cc4m merged commit b97c9ed into ggml-org:master Aug 31, 2025
48 of 49 checks passed
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
…#15649)

* vulkan: Allow fallback to sysmem memory when vidmem is full

* vulkan: Add env var GGML_VK_ALLOW_SYSMEM_FALLBACK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants