-
Notifications
You must be signed in to change notification settings - Fork 432
add support for scaled fp8 tensors #915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Actually, it does: quality degrades much more without it. Try e.g. any SDXL model at q5_0. |
interesting, good to know - I added back the dummy imatrix |
|
It seems that you have removed some code related to model conversion, such as f64 → f32. This can cause issues when loading certain models. I suggest that if you don’t fully understand the reason behind some parts of the code, you shouldn’t modify them. Instead, you should only implement the parts that you do understand. |
I think the convert_tensor function should handle that - there's a case added that checks for the GGML_TYPE_F64 source type and converts it to GGML_TYPE_F32 when necessary. If you have a model in mind that you think this change may break, I can test it out to make sure that it works properly. Imo it no longer makes sense to use hacky sd types now that ggml has added support for f64, bf16, etc, but if you have other reasons for not using the native ggml types, I'm all ears. |
|
Most of ggml's ops do not support f64/i64/bf16, which will cause issues. You can use this model for testing: https://civitai.com/models/7371/rev-animated. This model contains f64, and your changes will cause problems with it. |
Why, f16 is e5m10. This should be lossless. |
|
I think it's a better practice to avoid including too many unrelated changes like that in one PR. This makes it harder to review, if some changes are bad, the whole PR can't be merged, and it also has a higher chance of breaking many other pending PRs. |
scaled f8_e5m2 tensors are multiplied by a float32 scaling factor |
Good to know, thanks! I took a closer look at the ggml library and you're right that the f64/i64 types are missing kernels. I think bf16 has full support though for all the ops, as most GPUs have hardware support for this type. I tested the rev-animated model with these changes and it's working both with quantization disabled and quantization set to bf16.
I moved the wtype changes to a separate pr & split this one into two commits as the changes are stacked. |
ggml supports bf16 tensor operations this involves a refactor to the type parsing and conversion logic note that i64 now converts to f32 instead of i32
commit 1: add support for using bf16 as a native type - involves a refactor to the type parsing and conversion logic
commit 2: add support for scaled fp8 tensors