-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
vulkan: Allow fallback to sysmem memory when vidmem is full
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15649
opened Aug 28, 2025 by
jeffbolznv
Loading…
gguf-py: reduce peak RAM during convert by streaming dtype casts
python
python script changes
#15648
opened Aug 28, 2025 by
igloo58
Loading…
[SERVER] Added documentation for Improvements or additions to documentation
examples
server
parallel_tool_calls
param
documentation
#15647
opened Aug 28, 2025 by
ExtReMLapin
Loading…
tools: update llama-bench to include TTFT, E2E, ITL metrics
examples
#15643
opened Aug 28, 2025 by
taronaeo
Loading…
Catch up to the upstream
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Ascend NPU
issues specific to Ascend NPUs
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
OpenCL
Issues specific to the OpenCL backend
python
python script changes
script
Script related
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
#15642
opened Aug 28, 2025 by
skyne98
Loading…
granite embedding small support (ModernBert arch)
python
python script changes
#15641
opened Aug 28, 2025 by
ryan-mangeno
•
Draft
Hermes 2 tool calling : fixed crash when <tool_call> had a newline before it
#15639
opened Aug 28, 2025 by
ExtReMLapin
Loading…
CUDA: fuse adds, fuse add with rms norm
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#15631
opened Aug 28, 2025 by
am17an
Loading…
server : enable /slots by default and make it secure
examples
python
python script changes
server
#15630
opened Aug 28, 2025 by
ggerganov
Loading…
CANN: fix RoPE cache issue on multi-device
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
#15629
opened Aug 28, 2025 by
hipudding
Loading…
Possible fix: use ne0..ne3 (dst dims) instead of ne00..ne03 in ggml_compute_forward_dup_f16
ggml
changes relating to the ggml tensor library for machine learning
#15626
opened Aug 28, 2025 by
kperreau
Loading…
kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed
ggml
changes relating to the ggml tensor library for machine learning
#15614
opened Aug 27, 2025 by
chaxu01
Loading…
musa: fix build warnings
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#15611
opened Aug 27, 2025 by
yeahdongcn
•
Draft
Partial code documentation
documentation
Improvements or additions to documentation
#15601
opened Aug 26, 2025 by
grig95
Loading…
Model: Seed OSS thinking + tool call support
testing
Everything test related
#15552
opened Aug 24, 2025 by
pwilkin
Loading…
vulkan: mul_mat_id coopmat2 optimizations
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15546
opened Aug 24, 2025 by
jeffbolznv
Loading…
vulkan: use memory budget extension to read memory usage
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15545
opened Aug 24, 2025 by
giladgd
Loading…
vulkan: Skip syncing for prealloc_y when it is reused
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15544
opened Aug 24, 2025 by
jeffbolznv
Loading…
Deepseek V3.1 thinking mode is the default
examples
server
testing
Everything test related
#15533
opened Aug 23, 2025 by
createthis
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.