Skip to content

Conversation

@tsite
Copy link

@tsite tsite commented Oct 24, 2025

commit 1: add support for using bf16 as a native type - involves a refactor to the type parsing and conversion logic
commit 2: add support for scaled fp8 tensors

@wbruna
Copy link
Contributor

wbruna commented Oct 25, 2025

remove dummy imatrix from ggml_quantize_chunk call - I don't think it needs it

Actually, it does: quality degrades much more without it. Try e.g. any SDXL model at q5_0.

@tsite
Copy link
Author

tsite commented Oct 25, 2025

remove dummy imatrix from ggml_quantize_chunk call - I don't think it needs it

Actually, it does: quality degrades much more without it. Try e.g. any SDXL model at q5_0.

interesting, good to know - I added back the dummy imatrix

@leejet
Copy link
Owner

leejet commented Oct 25, 2025

It seems that you have removed some code related to model conversion, such as f64 → f32. This can cause issues when loading certain models. I suggest that if you don’t fully understand the reason behind some parts of the code, you shouldn’t modify them. Instead, you should only implement the parts that you do understand.

@tsite
Copy link
Author

tsite commented Oct 25, 2025

It seems that you have removed some code related to model conversion, such as f64 → f32. This can cause issues when loading certain models. I suggest that if you don’t fully understand the reason behind some parts of the code, you shouldn’t modify them. Instead, you should only implement the parts that you do understand.

I think the convert_tensor function should handle that - there's a case added that checks for the GGML_TYPE_F64 source type and converts it to GGML_TYPE_F32 when necessary. If you have a model in mind that you think this change may break, I can test it out to make sure that it works properly. Imo it no longer makes sense to use hacky sd types now that ggml has added support for f64, bf16, etc, but if you have other reasons for not using the native ggml types, I'm all ears.

@leejet
Copy link
Owner

leejet commented Oct 25, 2025

Most of ggml's ops do not support f64/i64/bf16, which will cause issues. You can use this model for testing: https://civitai.com/models/7371/rev-animated. This model contains f64, and your changes will cause problems with it.

@Green-Sky
Copy link
Contributor

Green-Sky commented Oct 25, 2025

f8_e5m2 now autoconvert to f32 for less precision loss

Why, f16 is e5m10. This should be lossless.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 25, 2025

I think it's a better practice to avoid including too many unrelated changes like that in one PR.

This makes it harder to review, if some changes are bad, the whole PR can't be merged, and it also has a higher chance of breaking many other pending PRs.

@tsite
Copy link
Author

tsite commented Oct 25, 2025

f8_e5m2 now autoconvert to f32 for less precision loss

Why, f16 is e5m10. This should be lossless.

scaled f8_e5m2 tensors are multiplied by a float32 scaling factor

@tsite tsite marked this pull request as draft October 25, 2025 22:13
@tsite
Copy link
Author

tsite commented Oct 26, 2025

Most of ggml's ops do not support f64/i64/bf16, which will cause issues. You can use this model for testing: https://civitai.com/models/7371/rev-animated. This model contains f64, and your changes will cause problems with it.

Good to know, thanks! I took a closer look at the ggml library and you're right that the f64/i64 types are missing kernels. I think bf16 has full support though for all the ops, as most GPUs have hardware support for this type. I tested the rev-animated model with these changes and it's working both with quantization disabled and quantization set to bf16.

I think it's a better practice to avoid including too many unrelated changes like that in one PR.

This makes it harder to review, if some changes are bad, the whole PR can't be merged, and it also has a higher chance of breaking many other pending PRs.

I moved the wtype changes to a separate pr & split this one into two commits as the changes are stacked.

@tsite tsite marked this pull request as ready for review October 26, 2025 23:51
tsite added 2 commits October 26, 2025 17:35
ggml supports bf16 tensor operations

this involves a refactor to the type parsing and conversion logic

note that i64 now converts to f32 instead of i32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants