Skip to content

Conversation

skyne98
Copy link

@skyne98 skyne98 commented Aug 28, 2025

Make sure to read the contributing guidelines before submitting a PR

larkinwc and others added 18 commits August 14, 2025 21:52
This commit introduces a new documentation file, CLAUDE.md, which provides comprehensive guidance on building, testing, and developing within the repository. It includes instructions for standard CPU and AMD GPU builds, testing commands, code formatting guidelines, architecture overview, and development best practices.
- Add Docker development environment with ROCm 5.7.3
- Create detailed optimization and implementation guides
- Add GitHub issue creation script with 15 structured tasks
- Implement Docker compose configuration for GPU passthrough
- Document hardware-specific optimizations for AMD MI50
- Include build system modifications for CMake/Make
- Add development workflow scripts

This commit establishes the foundation for optimizing llama.cpp
specifically for AMD Instinct MI50 (gfx906) GPUs with expected
35-45% performance improvements.

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>
- Introduced `0-fix-issue.md` for a structured approach to analyze and fix GitHub issues.
- Added `1-create-pr.md` to guide users on creating pull requests using the GitHub CLI.
- Created `2-review-failing-pipeline.md` to assist in reviewing and fixing failing pipelines.
- Add models/ and *.gguf to .gitignore to exclude model files
- Update Dockerfile.gfx906 to use ROCm 6.2 (available version)
- Add Dockerfile.gfx906-test for quick testing
- Add test_docker_inference.sh script for GPU verification
- Docker setup verified with GPU detection and inference capability

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>
- Replace local ggml with submodule from https://github.com/skyne98/ggml-gfx906
- Set up for GFX906-specific optimizations
- Branch: gfx906-optimizations

This migration enables deep tensor library optimizations specifically
for AMD Instinct MI50 (gfx906) hardware while maintaining upstream
compatibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add submodule initialization to all build docs
- Create specific GFX906 build guide
- Update Dockerfile to handle submodule
- Add note in README about submodule requirement

The ggml tensor library is now a required submodule that must be
initialized before building. This ensures users don't encounter
build failures due to missing ggml files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
feat: Add GFX906 optimization infrastructure and tooling + remove local ggml
- Add GFX906-specific configuration header with hardware specs
  * 60 CUs, 64KB LDS, wave size 64 configuration
  * Hardware capability detection and optimization helpers
  * V_DOT4_I32_I8 and V_DOT2_F32_F16 instruction support

- Implement device detection and initialization module
  * Automatic GFX906 device discovery
  * Stream pool management (4 default streams, up to 16)
  * Performance counters for profiling
  * Memory pool management with HBM2 optimization

- Integrate with existing HIP backend
  * Modified CMakeLists.txt to include GFX906 sources when targeting gfx906
  * Added initialization hooks in ggml-cuda.cu
  * Updated common.cuh to include GFX906 configuration

- Add comprehensive test suite
  * Device detection tests
  * Stream management validation
  * Memory allocation tests
  * Configuration verification

This implementation provides the core infrastructure needed for GFX906
(AMD Instinct MI50) support as specified in issue #1, including device
detection, stream management, and proper configuration for the hardware's
60 CUs, 64KB LDS, and wave size of 64.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
chore: Update ggml submodule with GFX906 backend support
- Automates full build process with GFX906 support
- Downloads Llama-2-7B Q4_0 model if not present
- Runs llama-bench with specified parameters for performance testing
- Includes progress indicators and error handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
feat: Add comprehensive benchmark script
@github-actions github-actions bot added documentation Improvements or additions to documentation script Script related testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs labels Aug 28, 2025
@github-actions github-actions bot added the OpenCL Issues specific to the OpenCL backend label Aug 28, 2025
@skyne98
Copy link
Author

skyne98 commented Sep 1, 2025

Oops, opened by mistake - close

@skyne98 skyne98 closed this Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend python python script changes script Script related SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants