Skip to content

Releases: ggml-org/llama.cpp

b7191

28 Nov 19:57
3ce7a65

Choose a tag to compare

server: fix: /metrics endpoint returning JSON-escaped Prometheus form…

b7190

28 Nov 18:39
e072b20

Choose a tag to compare

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…

b7189

28 Nov 17:42
c6f7a42

Choose a tag to compare

[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)

* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1

Signed-off-by: Xiaodong Ye <[email protected]>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/fattn-tile.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

b7188

28 Nov 15:19
2e7ef98

Choose a tag to compare

ggml-cuda: add stricter checking for fusion (#17568)

* ggml-cuda: make conditions for fusion more explicit

* ggml-cuda: remove size check as std::equal already does it

b7187

28 Nov 14:09
ddf9f94

Choose a tag to compare

server : add Anthropic Messages API support (#17570)

* server : add Anthropic Messages API support

* remove [email protected] from tool calling/jinja tests

* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py

* server : removed redundant n field logic in anthropic_params_from_json

* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

* server : refactor Anthropic API to use OAI conversion

* make sure basic test always go first

* clean up

* clean up api key check, add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b7186

28 Nov 12:37
ff55414

Choose a tag to compare

model : Qwen3 Next (#16095)

* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b7185

28 Nov 12:11
73955f7

Choose a tag to compare

CUDA: no FP16 arithmetic for vector FA kernel (#17558)

b7184

28 Nov 11:30
35cf888

Choose a tag to compare

vulkan: Implement GGML_OP_TRI (#17503)

* vulkan: Implement GGML_OP_TRI

* check types match

b7183

28 Nov 11:16
15d2b46

Choose a tag to compare

rpc : cache and reuse compute graphs (#15405)

Store the last computed graph and reuse it when possible.
Also do not return response from GRAPH_COMPUTE and assume it always
completes successfully. If this this is not the case, the server closes
the connection. This saves us a network round trip to the server.

b7182

28 Nov 09:20
6bca76f

Choose a tag to compare

HIP: enable mul_mat_f for RDNA4 (#17437)

* enable mmf for rdna4

* move some mmvf to mmf

* revert lds128 for wmma loading

* Revert "revert lds128 for wmma loading"

This reverts commit db9ae8b6b4738a5def5b393caa1611d52133e9b5.

* Revert "enable mmf for rdna4"

This reverts commit 698c9f24187b990e35c3b73a8067e5387e6ddbd4.

* Revert "move some mmvf to mmf"

This reverts commit 99b92bd6653cc8593607f641e44606391691792f.

* enable mul_mat for rdna4

---------

Co-authored-by: zhang hui <[email protected]>