Skip to content

0.0.6

Choose a tag to compare

@github-actions github-actions released this 17 Aug 14:58
· 208 commits to master since this release
  • Add tensor-parallel mode
  • Add support for Arcee achitecture
  • Add support for GLM4 achitecture (GLM4.5, GLM4.5-Air)
  • Fix CPU bottleneck in model loader
  • Reduce VRAM usage during quantization
  • Fused MoE routing kernels
  • Various bugfixes
  • QoL improvements

Full Changelog: v0.0.5...v0.0.6