-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Core][AMD] Add AMD Instinct MI350 and MI355 products #55853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: root <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the new AMD Instinct MI350 and MI355 series GPUs by updating the device ID mappings and adding corresponding accelerator type constants. The changes are straightforward, but I found an inconsistency in the naming of the AMD_INSTINCT_MI350x
constant that would prevent it from working correctly. My review includes a suggestion to fix this and to add unit tests to prevent similar issues in the future.
@@ -23,6 +23,8 @@ | |||
AMD_INSTINCT_MI300x_HF = "AMD-Instinct-MI300X-HF" | |||
AMD_INSTINCT_MI308x = "AMD-Instinct-MI308X" | |||
AMD_INSTINCT_MI325x = "AMD-Instinct-MI325X-OAM" | |||
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency in the accelerator type string for AMD_INSTINCT_MI350x
. The value should include the -OAM
suffix to match the product name defined in python/ray/_private/accelerators/amd_gpu.py
("AMD-Instinct-MI350X-OAM"
).
Without this change, ray.resource_spec(accelerator_type=ray.util.accelerators.AMD_INSTINCT_MI350x)
would not correctly match nodes equipped with this GPU, as the node's accelerator type would be reported as "AMD-Instinct-MI350X-OAM"
.
To prevent such issues in the future, it would be beneficial to add a unit test for the new accelerator types, similar to test_visible_amd_gpu_type
in test_amd_gpu.py
, to verify that the constants defined here match the types returned by AMDGPUAcceleratorManager
.
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X" | |
AMD_INSTINCT_MI350x = "AMD-Instinct-MI350X-OAM" |
Signed-off-by: root <[email protected]>
) Signed-off-by: root <[email protected]> Signed-off-by: Lehui Liu <[email protected]>
) Signed-off-by: root <[email protected]> Signed-off-by: Masahiro Tanaka <[email protected]>
) Signed-off-by: root <[email protected]> Signed-off-by: Masahiro Tanaka <[email protected]>
Supporting new AMD Instinct products
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.