Skip to content

Conversation

vickytsang
Copy link
Contributor

Supporting new AMD Instinct products

Checks

  • [ x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ x] I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • [x ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ x] Unit tests
    • Release tests
    • This PR is not tested :(

@vickytsang vickytsang requested a review from a team as a code owner August 22, 2025 21:52
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the new AMD Instinct MI350 and MI355 series GPUs by updating the device ID mappings and adding corresponding accelerator type constants. The changes are straightforward, but I found an inconsistency in the naming of the AMD_INSTINCT_MI350x constant that would prevent it from working correctly. My review includes a suggestion to fix this and to add unit tests to prevent similar issues in the future.

@@ -23,6 +23,8 @@
AMD_INSTINCT_MI300x_HF = "AMD-Instinct-MI300X-HF"
AMD_INSTINCT_MI308x = "AMD-Instinct-MI308X"
AMD_INSTINCT_MI325x = "AMD-Instinct-MI325X-OAM"
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an inconsistency in the accelerator type string for AMD_INSTINCT_MI350x. The value should include the -OAM suffix to match the product name defined in python/ray/_private/accelerators/amd_gpu.py ("AMD-Instinct-MI350X-OAM").

Without this change, ray.resource_spec(accelerator_type=ray.util.accelerators.AMD_INSTINCT_MI350x) would not correctly match nodes equipped with this GPU, as the node's accelerator type would be reported as "AMD-Instinct-MI350X-OAM".

To prevent such issues in the future, it would be beneficial to add a unit test for the new accelerator types, similar to test_visible_amd_gpu_type in test_amd_gpu.py, to verify that the constants defined here match the types returned by AMDGPUAcceleratorManager.

Suggested change
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X"
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X-OAM"

Signed-off-by: root <[email protected]>
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Aug 23, 2025
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Aug 25, 2025
@jjyao jjyao changed the title add AMD Instinct MI350 and MI355 products [Core][AMD] Add AMD Instinct MI350 and MI355 products Aug 25, 2025
@jjyao jjyao enabled auto-merge (squash) August 25, 2025 23:22
@jjyao jjyao merged commit 6dae9f3 into ray-project:master Aug 26, 2025
7 checks passed
liulehui pushed a commit to liulehui/ray that referenced this pull request Aug 26, 2025
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants