Skip to content

Conversation

marosset
Copy link
Contributor

@marosset marosset commented Sep 17, 2025

Why are these changes needed?

When ray get-head-ip is called the provider_config that is passed to AzureNodeProvider's init only contains info from the cluster configuration file the user authored, which doesn't require a subscription_id to be set.
This causes a key read error when the AzureNodeProvider is getting initialized (unless the users set that).

Since other operations like ray up, ray attached, etc do not require subscription_id to be set, this patch updates the init to read the subscription_id from the currently logged in azure profile if it isn't set in the cluster config.

Related issue number

Fixes: #44254

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

AzureNodeProvider now falls back to the Azure CLI profile subscription ID if provider_config.subscription_id is missing.

  • Azure Autoscaler:
    • python/ray/autoscaler/_private/_azure/node_provider.py
      • AzureNodeProvider.__init__: Make subscription_id optional; if absent, use get_cli_profile().get_subscription_id() and log the chosen ID.
      • Add import for get_cli_profile and use the resolved subscription_id to initialize ComputeManagementClient, NetworkManagementClient, and ResourceManagementClient.

Written by Cursor Bugbot for commit 13a2157. This will update automatically on new commits. Configure here.

@marosset marosset requested a review from a team as a code owner September 17, 2025 21:20
@marosset
Copy link
Contributor Author

I tested this manually

ray get-head-ip /home/marosset/src/github.com/marosset/ray/python/ray/autoscaler/azure/tests/azure_cluster.yaml
2025-09-17 14:06:08,164 - INFO - NumExpr defaulting to 8 threads.
2025-09-17 14:06:09,373 INFO node_provider.py:74 -- subscription_id not found in provider config, but successfully read from azure profile: 48d13bcf-f3f8-4070-8f75-cd748d4152c4
2025-09-17 14:06:09,373 - INFO - No environment configuration found.
2025-09-17 14:06:09,374 - INFO - ManagedIdentityCredential will use IMDS
2025-09-17 14:06:09,377 - INFO - No environment configuration found.
2025-09-17 14:06:09,377 - INFO - ManagedIdentityCredential will use IMDS
2025-09-17 14:06:11,485 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:12,331 - INFO - AzureCliCredential.get_token succeeded
2025-09-17 14:06:12,331 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:14,214 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2025-09-17 14:06:14,815 - INFO - AzureCliCredential.get_token succeeded
2025-09-17 14:06:14,815 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
20.191.82.34

my cluster config looks like

cat /home/marosset/src/github.com/marosset/ray/python/ray/autoscaler/azure/tests/azure_cluster.yaml
# Unique identifier for the head node and workers of this cluster.
cluster_name: nightly-cpu-minimal-vm
max_workers: 2
idle_timeout_minutes: 5

# Cloud-provider specific configuration.
provider:
  type: azure
  # https://azure.microsoft.com/en-us/global-infrastructure/locations
  location: westus2
  resource_group: ray-test
  cache_stopped_nodes: False

...

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where ray get-head-ip would fail for Azure clusters if the subscription_id is not explicitly provided in the configuration file. The change introduces a fallback mechanism in the AzureNodeProvider's initialization to retrieve the subscription ID from the active Azure CLI profile, which aligns its behavior with other commands like ray up. This is a good fix. I have one minor suggestion for improving code consistency.

@marosset
Copy link
Contributor Author

cc @MKLepium


def __init__(self, provider_config, cluster_name):
NodeProvider.__init__(self, provider_config, cluster_name)
if "subscription_id" not in provider_config:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this the exact same way both here and in python/ray/autoscaler/_private/_azure/config.py (in _configure_resource_group).

If we do this after L80 we can avoid polluting the actual provider_config, e.g., a new addition @ L81:

if subscription_id is None:
        # Get subscription from logged in azure profile
        # if it isn't provided in the provider_config
        # so operations like `get-head-ip` will work
        subscription_id = get_cli_profile().get_subscription_id()
        logger.info(
                "subscription_id not found in provider config, falling back "
                f"to subscription_id from the logged in azure profile: {subscription_id}"
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check to see if the value is pulled out of the provider config dictionary later and if it isn't I'll update with this suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to not pollute the actual provider_config - PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the line to re-add it to the dictionary and tested various ray command (up, down, status, get-head-ip) and all work as expected.

@marosset marosset force-pushed the azure-fix-missing-subscirption-keyerror branch from d644766 to 7912cb8 Compare October 1, 2025 17:58
cursor[bot]

This comment was marked as outdated.

# if it isn't provided in the provider_config
# so operations like `get-head-ip` will work
subscription_id = get_cli_profile().get_subscription_id()
provider_config["subscription_id"] = subscription_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any negative side-effects from updating the in-memory representation of the parsed user config in this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, i meant to drop the extra line to add it back to the dictionary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

… in config during AzureNodeProvider init

Signed-off-by: Mark Rossett <[email protected]>
@marosset marosset force-pushed the azure-fix-missing-subscirption-keyerror branch from 7912cb8 to 3ada4a5 Compare October 1, 2025 18:24
Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jackfrancis
Copy link
Contributor

ping @jjyao for final review

@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Oct 1, 2025
# Get subscription from logged in azure profile
# if it isn't provided in the provider_config
# so operations like `get-head-ip` will work
subscription_id = get_cli_profile().get_subscription_id()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Azure Subscription ID Missing Causes Client Initialization Failure

If subscription_id isn't provided in the config, the code falls back to get_cli_profile().get_subscription_id(). When no default subscription is set in the Azure CLI profile, this method returns None. This None value is then used to initialize Azure management clients, which expect a valid subscription ID and will fail.

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marosset is this a valid comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is strictly true but it is not a regression. The current implementation relies upon the subscription ID being in the config. There are no checks to deal with the config not having a subscription id in the config (hence this PR), it just fails more or less in the way that cursor is describing.

So, this PR adds another fallback layer of accomodation, but does not address the question "what's the right way to fail when no subscription ID is available in any way?" (I think this is out of scope for this PR)

@jjyao jjyao enabled auto-merge (squash) October 1, 2025 23:54
@jjyao jjyao merged commit bef2a0e into ray-project:master Oct 2, 2025
7 of 8 checks passed
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
… in config during AzureNodeProvider init (ray-project#56640)

Signed-off-by: Mark Rossett <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
… in config during AzureNodeProvider init (ray-project#56640)

Signed-off-by: Mark Rossett <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure community-contribution Contributed by the community go add ONLY when ready to merge, run all tests
Development

Successfully merging this pull request may close these issues.

[Ray Clusters] Azure subscription id is required in yaml to get-head-ip
3 participants