Releases · ggml-org/llama.cpp

28 Nov 19:57

3ce7a65

b7191

server: fix: /metrics endpoint returning JSON-escaped Prometheus form…

Assets 20

28 Nov 18:39

github-actions

b7190

e072b20

b7190

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in g…

Assets 20

28 Nov 17:42

github-actions

b7189

c6f7a42

b7189

[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)

* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1

Signed-off-by: Xiaodong Ye <[email protected]>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/fattn-vec.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Update ggml/src/ggml-cuda/fattn-tile.cuh

Co-authored-by: Johannes Gäßler <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

Assets 20

28 Nov 15:19

github-actions

b7188

2e7ef98

b7188

ggml-cuda: add stricter checking for fusion (#17568)

* ggml-cuda: make conditions for fusion more explicit

* ggml-cuda: remove size check as std::equal already does it

Assets 20

28 Nov 14:09

github-actions

b7187

ddf9f94

b7187

server : add Anthropic Messages API support (#17570)

* server : add Anthropic Messages API support

* remove [email protected] from tool calling/jinja tests

* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py

* server : removed redundant n field logic in anthropic_params_from_json

* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

* server : refactor Anthropic API to use OAI conversion

* make sure basic test always go first

* clean up

* clean up api key check, add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 20

28 Nov 12:37

github-actions

b7186

ff55414

b7186

model : Qwen3 Next (#16095)

* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

28 Nov 12:11

github-actions

b7185

73955f7

b7185

CUDA: no FP16 arithmetic for vector FA kernel (#17558)

Assets 20

28 Nov 11:30

github-actions

b7184

35cf888

b7184

vulkan: Implement GGML_OP_TRI (#17503)

* vulkan: Implement GGML_OP_TRI

* check types match

Assets 20

28 Nov 11:16

github-actions

b7183

15d2b46

b7183

rpc : cache and reuse compute graphs (#15405)

Store the last computed graph and reuse it when possible.
Also do not return response from GRAPH_COMPUTE and assume it always
completes successfully. If this this is not the case, the server closes
the connection. This saves us a network round trip to the server.

Assets 20

28 Nov 09:20

github-actions

b7182

6bca76f

b7182

HIP: enable mul_mat_f for RDNA4 (#17437)

* enable mmf for rdna4

* move some mmvf to mmf

* revert lds128 for wmma loading

* Revert "revert lds128 for wmma loading"

This reverts commit db9ae8b6b4738a5def5b393caa1611d52133e9b5.

* Revert "enable mmf for rdna4"

This reverts commit 698c9f24187b990e35c3b73a8067e5387e6ddbd4.

* Revert "move some mmvf to mmf"

This reverts commit 99b92bd6653cc8593607f641e44606391691792f.

* enable mul_mat for rdna4

---------

Co-authored-by: zhang hui <[email protected]>

Assets 20

Releases: ggml-org/llama.cpp

b7191

Uh oh!

b7190

Uh oh!

b7189

Uh oh!

b7188

Uh oh!

b7187

Uh oh!

b7186

Uh oh!

b7185

Uh oh!

b7184

Uh oh!

b7183

Uh oh!

b7182

Uh oh!