Skip to content

Ray Clusters - updating cluster with new docker image should actually use new image #51448

@uazafar

Description

@uazafar

Description

Hi,

I'm submitting this enhancement after discussion with Jiajun Yao on Slack.

I notice that when I update the docker image in my cluster configuration and run ray up..., the cluster continues to run the older image. I see a message in the logs that states:

"A container with name X is running image A instead of B (which was provided in the YAML)"

I see from the source code there is a function _check_if_container_restart_is_needed, which has the section:

        if running_image != image:
            cli_logger.error(
                "A container with name {} is running image {} instead "
                + "of {} (which was provided in the YAML)",
                self.container_name,
                running_image,
                image,
            )

It only sets re_init_required to True when differences in the mounts are detected. I'm wondering why this is not set to True when a new image is detected? It seems like specifying a new image should allow a cluster to use that new image, thereby requiring a container restart. Curious as to why this is the case and if there is a workaround that does not involve running ray down... and ray up....

Please let me know if you require more information.

Thanks
Usman

Use case

When updating my Ray cluster with a new docker image and running ray up..., I would expect the updated cluster to use that new image. At the moment it is being ignored and I don't consider this intuitive behaviour. This feature would make updating clusters simpler and possibly even allow such updated to be done without interrupting running jobs, which is very valuable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalcommunity-backlogcoreIssues that should be addressed in Ray Corecore-clustersFor launching and managing Ray clusters/jobs/kubernetesenhancementRequest for new feature and/or capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions