Skip to content

[Core] Prevent schedulling non-GPU tasks to GPU nodes #47866

@eduardohenriquearnold

Description

@eduardohenriquearnold

Description

Context
I have a GPU node pool, which defaults to 0 active nodes in order to save compute resources.
When I submit tasks that require a GPU, that node pool is scaled on demand via k8s/RayCluster.
Once that task finishes, another task which doesn’t require a GPU starts, but it’s being scheduled to the currently active GPU node. Ideally, I would like this follow-up task to use or to spin-up a non-GPU node, allowing the GPU node to be de-allocated.

Feature Request
Be able to restrict tasks that do not require a GPU (num_gpus=0) to non-GPU nodes, i.e. nodes in which num_gpus=0.

What I have tried
I have considered a workaround which is creating a custom resource, e.g. no_gpu, and assigning it to 1 for all nodes that do not have a GPU, then requesting no_gpu=0.1 for such non-GPU tasks. That will force Ray to schedule such tasks to these nodes.

The problem with this approach is that the nodes with custom resources need to be active for the RayHead node to know about the custom resources' existence. So, for example, if I have some high-memory pools which have 0 default active nodes, Ray doesn't "know" that these nodes need to be spun-up in order to create no_gpu resources. AFAIK, the custom resource definition happens in the command-line of starting the Ray worker, so the head has no way of knowing that it needs to scale a given pool in order to increase a specific custom resource.

Use case

Given a cluster with the following pools:

  1. GPU pool (defaults to 0 active nodes)
  2. High memory pool (defaults to 0 active nodes)
  3. Regular CPU pool (1 active node by default, RayHead)

I want to be able to submit two sequential tasks:

Task A (requires 1 GPU) -> Task B (requires 0 GPU, requires high mem), -> End

and get the first task to spin-up and run on the GPU node pool, and the second task to run on the high-mem pool, freeing resources from the GPU pool to either run other tasks that do require the GPU or to get de-allocated.

Note that:

  • I cannot set node affinity for tasks to run on a specific node, since the nodes do not yet exist, they'll be spun up automatically based on demand.
  • I cannot set a specific threshold of memory of Task B such that it surpass the GPU node available memory and Ray is forced to schedule it in the high-mem pool. The reason for this constraint is that I may want to run multiple instances of Task B in a single high-mem node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weekscoreIssues that should be addressed in Ray Corecore-autoscalerautoscaler related issuescore-schedulerquestionJust a question :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions