-
Notifications
You must be signed in to change notification settings - Fork 13k
Catch up to the upstream #15642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Catch up to the upstream #15642
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit introduces a new documentation file, CLAUDE.md, which provides comprehensive guidance on building, testing, and developing within the repository. It includes instructions for standard CPU and AMD GPU builds, testing commands, code formatting guidelines, architecture overview, and development best practices.
- Add Docker development environment with ROCm 5.7.3 - Create detailed optimization and implementation guides - Add GitHub issue creation script with 15 structured tasks - Implement Docker compose configuration for GPU passthrough - Document hardware-specific optimizations for AMD MI50 - Include build system modifications for CMake/Make - Add development workflow scripts This commit establishes the foundation for optimizing llama.cpp specifically for AMD Instinct MI50 (gfx906) GPUs with expected 35-45% performance improvements. 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>
- Introduced `0-fix-issue.md` for a structured approach to analyze and fix GitHub issues. - Added `1-create-pr.md` to guide users on creating pull requests using the GitHub CLI. - Created `2-review-failing-pipeline.md` to assist in reviewing and fixing failing pipelines.
- Add models/ and *.gguf to .gitignore to exclude model files - Update Dockerfile.gfx906 to use ROCm 6.2 (available version) - Add Dockerfile.gfx906-test for quick testing - Add test_docker_inference.sh script for GPU verification - Docker setup verified with GPU detection and inference capability 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>
- Replace local ggml with submodule from https://github.com/skyne98/ggml-gfx906 - Set up for GFX906-specific optimizations - Branch: gfx906-optimizations This migration enables deep tensor library optimizations specifically for AMD Instinct MI50 (gfx906) hardware while maintaining upstream compatibility. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add submodule initialization to all build docs - Create specific GFX906 build guide - Update Dockerfile to handle submodule - Add note in README about submodule requirement The ggml tensor library is now a required submodule that must be initialized before building. This ensures users don't encounter build failures due to missing ggml files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
feat: Add GFX906 optimization infrastructure and tooling + remove local ggml
- Add GFX906-specific configuration header with hardware specs * 60 CUs, 64KB LDS, wave size 64 configuration * Hardware capability detection and optimization helpers * V_DOT4_I32_I8 and V_DOT2_F32_F16 instruction support - Implement device detection and initialization module * Automatic GFX906 device discovery * Stream pool management (4 default streams, up to 16) * Performance counters for profiling * Memory pool management with HBM2 optimization - Integrate with existing HIP backend * Modified CMakeLists.txt to include GFX906 sources when targeting gfx906 * Added initialization hooks in ggml-cuda.cu * Updated common.cuh to include GFX906 configuration - Add comprehensive test suite * Device detection tests * Stream management validation * Memory allocation tests * Configuration verification This implementation provides the core infrastructure needed for GFX906 (AMD Instinct MI50) support as specified in issue #1, including device detection, stream management, and proper configuration for the hardware's 60 CUs, 64KB LDS, and wave size of 64. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
chore: Update ggml submodule with GFX906 backend support
- Automates full build process with GFX906 support - Downloads Llama-2-7B Q4_0 model if not present - Runs llama-bench with specified parameters for performance testing - Includes progress indicators and error handling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
feat: Add comprehensive benchmark script
Oops, opened by mistake - close |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Ascend NPU
issues specific to Ascend NPUs
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
OpenCL
Issues specific to the OpenCL backend
python
python script changes
script
Script related
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
testing
Everything test related
Vulkan
Issues specific to the Vulkan backend
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR