[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Dango233 · 2025-02-06T07:11:17Z

Dango233
Feb 6, 2025

Ascends NPUs seems to be a great alternative (to Macstudio and epyc) to run quantized R1.
For example: Atlas 300I Duo offers 140TFLOPS fp16 408GB/s mem bandwidth + 96G Vram.
2 of this card onto a PC could run the quantized 671B R1 relatively well I would say.

However, as shown in https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/CANN.md, there is no deepseek architecture support yet, and low bit quantization seems to be not validated yet.

@hipudding Do you have plan on porting low-bit quantized R1 to Ascend cards, via gguf-cann backend?

That seems a pretty valid use case to me...

cecuca · 2025-10-09T12:16:37Z

cecuca
Oct 9, 2025

@hipudding is there something we can help you with to make it happen?

0 replies

hipudding · 2025-10-16T08:40:22Z

hipudding
Oct 16, 2025
Collaborator

Thank you for your interest in Ascend.
Currently, the best-supported format for running llama.cpp on Ascend is FP16, with partial support for Q8_0 and Q4_0 on certain device.
However, based on actual testing, the execution efficiency of quantized operators is not very high — in some cases, even lower than FP16. In addition, hardware support for 4-bit or lower-bit quantization is not yet available for all devices.

If you want to enable quantized formats, I believe q8(q8_0,q8_1,q8_k_m) and q4(q4_0,q4_1,q4_k_m) are feasible. It would only require implementing the quantized versions of GGML_OP_GET_ROWS, GGML_OP_MUL_MAT, and GGML_OP_MUL_MAT_ID.
We are also looking forward to seeing excellent inference performance on quantized models with Ascend in the future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Uh oh!

Dango233 Feb 6, 2025

Replies: 2 comments

Uh oh!

cecuca Oct 9, 2025

Uh oh!

hipudding Oct 16, 2025 Collaborator

Dango233
Feb 6, 2025

cecuca
Oct 9, 2025

hipudding
Oct 16, 2025
Collaborator