Skip to content

[Ray Clusters] Azure subscription id is required in yaml to get-head-ip #44254

@MKLepium

Description

@MKLepium

What happened + What you expected to happen

I am using Ray Clusters to deploy a cluster into microsoft azure. The subscription id and the login is done via az cli.
When I start my ray cluster using a generic deployment-example.yaml and use it to start the cluster ray up deployment-example.yaml and then try to use the ray get-head-ip deployment-example.yaml command, it errors with the following:

Traceback (most recent call last):
  File "/home/mk/.local/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/mk/.local/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2498, in main
    return cli()
  File "/home/mk/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/mk/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/mk/.local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mk/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mk/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/mk/.local/lib/python3.10/site-packages/ray/scripts/scripts.py", line 1763, in get_head_ip
    click.echo(get_head_node_ip(cluster_config_file, cluster_name))
  File "/home/mk/.local/lib/python3.10/site-packages/ray/autoscaler/_private/commands.py", line 1359, in get_head_node_ip
    provider = _get_node_provider(config["provider"], config["cluster_name"])
  File "/home/mk/.local/lib/python3.10/site-packages/ray/autoscaler/_private/providers.py", line 269, in _get_node_provider
    new_provider = provider_cls(provider_config, cluster_name)
  File "/home/mk/.local/lib/python3.10/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 59, in __init__
    subscription_id = provider_config["subscription_id"]
KeyError: 'subscription_id'

It works when I add the subscription_id to the provider section of my yaml.

Versions / Dependencies

Ray 2.9.3

Reproduction script

I am using a slightly modified version of the provided azure example.
https://raw.githubusercontent.com/ray-project/ray/master/python/ray/autoscaler/azure/example-full.yaml

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issue moderate in impact or severityazurebugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray Corecore-clustersFor launching and managing Ray clusters/jobs/kubernetes

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions