Catch up to the upstream #15642

skyne98 · 2025-08-28T17:44:08Z

Make sure to read the contributing guidelines before submitting a PR

This commit introduces a new documentation file, CLAUDE.md, which provides comprehensive guidance on building, testing, and developing within the repository. It includes instructions for standard CPU and AMD GPU builds, testing commands, code formatting guidelines, architecture overview, and development best practices.

- Add Docker development environment with ROCm 5.7.3 - Create detailed optimization and implementation guides - Add GitHub issue creation script with 15 structured tasks - Implement Docker compose configuration for GPU passthrough - Document hardware-specific optimizations for AMD MI50 - Include build system modifications for CMake/Make - Add development workflow scripts This commit establishes the foundation for optimizing llama.cpp specifically for AMD Instinct MI50 (gfx906) GPUs with expected 35-45% performance improvements. 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

- Introduced `0-fix-issue.md` for a structured approach to analyze and fix GitHub issues. - Added `1-create-pr.md` to guide users on creating pull requests using the GitHub CLI. - Created `2-review-failing-pipeline.md` to assist in reviewing and fixing failing pipelines.

- Add models/ and *.gguf to .gitignore to exclude model files - Update Dockerfile.gfx906 to use ROCm 6.2 (available version) - Add Dockerfile.gfx906-test for quick testing - Add test_docker_inference.sh script for GPU verification - Docker setup verified with GPU detection and inference capability 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

- Replace local ggml with submodule from https://github.com/skyne98/ggml-gfx906 - Set up for GFX906-specific optimizations - Branch: gfx906-optimizations This migration enables deep tensor library optimizations specifically for AMD Instinct MI50 (gfx906) hardware while maintaining upstream compatibility. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add submodule initialization to all build docs - Create specific GFX906 build guide - Update Dockerfile to handle submodule - Add note in README about submodule requirement The ggml tensor library is now a required submodule that must be initialized before building. This ensures users don't encounter build failures due to missing ggml files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

feat: Add GFX906 optimization infrastructure and tooling + remove local ggml

- Add GFX906-specific configuration header with hardware specs * 60 CUs, 64KB LDS, wave size 64 configuration * Hardware capability detection and optimization helpers * V_DOT4_I32_I8 and V_DOT2_F32_F16 instruction support - Implement device detection and initialization module * Automatic GFX906 device discovery * Stream pool management (4 default streams, up to 16) * Performance counters for profiling * Memory pool management with HBM2 optimization - Integrate with existing HIP backend * Modified CMakeLists.txt to include GFX906 sources when targeting gfx906 * Added initialization hooks in ggml-cuda.cu * Updated common.cuh to include GFX906 configuration - Add comprehensive test suite * Device detection tests * Stream management validation * Memory allocation tests * Configuration verification This implementation provides the core infrastructure needed for GFX906 (AMD Instinct MI50) support as specified in issue #1, including device detection, stream management, and proper configuration for the hardware's 60 CUs, 64KB LDS, and wave size of 64. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

chore: Update ggml submodule with GFX906 backend support

- Automates full build process with GFX906 support - Downloads Llama-2-7B Q4_0 model if not present - Runs llama-bench with specified parameters for performance testing - Includes progress indicators and error handling 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

feat: Add comprehensive benchmark script

skyne98 · 2025-09-01T15:24:50Z

Oops, opened by mistake - close

larkinwc and others added 18 commits August 14, 2025 21:52

Adding reference docs

4385858

chore: Add .specstory to gitignore

b0a69f3

chore: Remove create-github-issues script

d57ad8d

adding other docs

7e7516f

Merge branch 'feat/1' of github.com:skyne98/llama.cpp-gfx906 into feat/1

9496871

Merge pull request #17 from skyne98/feat/1

d65185a

feat: Add GFX906 optimization infrastructure and tooling + remove local ggml

test: Update GFX906 backend test with proper headers and function calls

6bdce58

chore: Update ggml submodule with GFX906 backend support

8976d0c

Merge pull request #20 from skyne98/feat/update-ggml-gfx906

f576e7c

chore: Update ggml submodule with GFX906 backend support

Merge pull request #22 from skyne98/feat/benchmark-script

1f4549c

feat: Add comprehensive benchmark script

skyne98 requested review from 0cc4m and JohannesGaessler as code owners August 28, 2025 17:44

github-actions bot added the OpenCL Issues specific to the OpenCL backend label Aug 28, 2025

skyne98 closed this Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Catch up to the upstream #15642

Catch up to the upstream #15642

Uh oh!

skyne98 commented Aug 28, 2025

Uh oh!

skyne98 commented Sep 1, 2025

Uh oh!

Uh oh!

Catch up to the upstream #15642

Catch up to the upstream #15642

Uh oh!

Conversation

skyne98 commented Aug 28, 2025

Uh oh!

skyne98 commented Sep 1, 2025

Uh oh!

Uh oh!