-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Core] [Azure] Use subscription id from azure profile if not provided in config during AzureNodeProvider init #56640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] [Azure] Use subscription id from azure profile if not provided in config during AzureNodeProvider init #56640
Conversation
I tested this manually ray get-head-ip /home/marosset/src/github.com/marosset/ray/python/ray/autoscaler/azure/tests/azure_cluster.yaml
2025-09-17 14:06:08,164 - INFO - NumExpr defaulting to 8 threads.
2025-09-17 14:06:09,373 INFO node_provider.py:74 -- subscription_id not found in provider config, but successfully read from azure profile: 48d13bcf-f3f8-4070-8f75-cd748d4152c4
2025-09-17 14:06:09,373 - INFO - No environment configuration found.
2025-09-17 14:06:09,374 - INFO - ManagedIdentityCredential will use IMDS
2025-09-17 14:06:09,377 - INFO - No environment configuration found.
2025-09-17 14:06:09,377 - INFO - ManagedIdentityCredential will use IMDS
2025-09-17 14:06:11,485 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:12,331 - INFO - AzureCliCredential.get_token succeeded
2025-09-17 14:06:12,331 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:14,214 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:14,815 - INFO - AzureCliCredential.get_token succeeded
2025-09-17 14:06:14,815 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
20.191.82.34 my cluster config looks like cat /home/marosset/src/github.com/marosset/ray/python/ray/autoscaler/azure/tests/azure_cluster.yaml
# Unique identifier for the head node and workers of this cluster.
cluster_name: nightly-cpu-minimal-vm
max_workers: 2
idle_timeout_minutes: 5
# Cloud-provider specific configuration.
provider:
type: azure
# https://azure.microsoft.com/en-us/global-infrastructure/locations
location: westus2
resource_group: ray-test
cache_stopped_nodes: False
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue where ray get-head-ip
would fail for Azure clusters if the subscription_id
is not explicitly provided in the configuration file. The change introduces a fallback mechanism in the AzureNodeProvider
's initialization to retrieve the subscription ID from the active Azure CLI profile, which aligns its behavior with other commands like ray up
. This is a good fix. I have one minor suggestion for improving code consistency.
cc @MKLepium |
f913625
to
d644766
Compare
|
||
def __init__(self, provider_config, cluster_name): | ||
NodeProvider.__init__(self, provider_config, cluster_name) | ||
if "subscription_id" not in provider_config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this the exact same way both here and in python/ray/autoscaler/_private/_azure/config.py
(in _configure_resource_group
).
If we do this after L80 we can avoid polluting the actual provider_config
, e.g., a new addition @ L81:
if subscription_id is None:
# Get subscription from logged in azure profile
# if it isn't provided in the provider_config
# so operations like `get-head-ip` will work
subscription_id = get_cli_profile().get_subscription_id()
logger.info(
"subscription_id not found in provider config, falling back "
f"to subscription_id from the logged in azure profile: {subscription_id}"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check to see if the value is pulled out of the provider config dictionary later and if it isn't I'll update with this suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to not pollute the actual provider_config - PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the line to re-add it to the dictionary and tested various ray command (up, down, status, get-head-ip) and all work as expected.
d644766
to
7912cb8
Compare
# if it isn't provided in the provider_config | ||
# so operations like `get-head-ip` will work | ||
subscription_id = get_cli_profile().get_subscription_id() | ||
provider_config["subscription_id"] = subscription_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any negative side-effects from updating the in-memory representation of the parsed user config in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, i meant to drop the extra line to add it back to the dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
… in config during AzureNodeProvider init Signed-off-by: Mark Rossett <[email protected]>
7912cb8
to
3ada4a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ping @jjyao for final review |
# Get subscription from logged in azure profile | ||
# if it isn't provided in the provider_config | ||
# so operations like `get-head-ip` will work | ||
subscription_id = get_cli_profile().get_subscription_id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Azure Subscription ID Missing Causes Client Initialization Failure
If subscription_id
isn't provided in the config, the code falls back to get_cli_profile().get_subscription_id()
. When no default subscription is set in the Azure CLI profile, this method returns None
. This None
value is then used to initialize Azure management clients, which expect a valid subscription ID and will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marosset is this a valid comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is strictly true but it is not a regression. The current implementation relies upon the subscription ID being in the config. There are no checks to deal with the config not having a subscription id in the config (hence this PR), it just fails more or less in the way that cursor is describing.
So, this PR adds another fallback layer of accomodation, but does not address the question "what's the right way to fail when no subscription ID is available in any way?" (I think this is out of scope for this PR)
… in config during AzureNodeProvider init (ray-project#56640) Signed-off-by: Mark Rossett <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Seiji Eicher <[email protected]>
… in config during AzureNodeProvider init (ray-project#56640) Signed-off-by: Mark Rossett <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
Why are these changes needed?
When ray get-head-ip is called the
provider_config
that is passed to AzureNodeProvider's init only contains info from the cluster configuration file the user authored, which doesn't require a subscription_id to be set.This causes a key read error when the AzureNodeProvider is getting initialized (unless the users set that).
Since other operations like ray up, ray attached, etc do not require subscription_id to be set, this patch updates the init to read the subscription_id from the currently logged in azure profile if it isn't set in the cluster config.
Related issue number
Fixes: #44254
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.Note
AzureNodeProvider now falls back to the Azure CLI profile subscription ID if
provider_config.subscription_id
is missing.python/ray/autoscaler/_private/_azure/node_provider.py
AzureNodeProvider.__init__
: Makesubscription_id
optional; if absent, useget_cli_profile().get_subscription_id()
and log the chosen ID.get_cli_profile
and use the resolvedsubscription_id
to initializeComputeManagementClient
,NetworkManagementClient
, andResourceManagementClient
.Written by Cursor Bugbot for commit 13a2157. This will update automatically on new commits. Configure here.