0.0.6
- Add tensor-parallel mode
- Add support for Arcee achitecture
- Add support for GLM4 achitecture (GLM4.5, GLM4.5-Air)
- Fix CPU bottleneck in model loader
- Reduce VRAM usage during quantization
- Fused MoE routing kernels
- Various bugfixes
- QoL improvements
Full Changelog: v0.0.5...v0.0.6