Skip to content

[Ray Core] Cannot use joblib parallel backend for scikit-learn with remote cluster #50306

@oelhammouchi

Description

@oelhammouchi

What happened + What you expected to happen

I'm trying to fit a scikit-learn model in a distributed manner with Ray following this guide, but it's failing with an error about not being able to serialize some PoolActor class. I checked it with the same snippet as given there, so it's definitely not a code issue. Here's the stack trace:

   File "/home/othman/miniforge3/envs/data-science/lib/python3.8/site-packages/ray/util/client/worker.py", line 555, in call_remote
    task = instance._prepare_client_task()
  File "/home/othman/miniforge3/envs/data-science/lib/python3.8/site-packages/ray/util/client/common.py", line 595, in _prepare_client_task
    task = self._remote_stub._prepare_client_task()
  File "/home/othman/miniforge3/envs/data-science/lib/python3.8/site-packages/ray/util/client/common.py", line 409, in _prepare_client_task
    self._ensure_ref()
  File "/home/othman/miniforge3/envs/data-science/lib/python3.8/site-packages/ray/util/client/common.py", line 379, in _ensure_ref
    self._ref = ray.worker._put_pickled(
  File "/home/othman/miniforge3/envs/data-science/lib/python3.8/site-packages/ray/util/client/worker.py", line 509, in _put_pickled
    raise cloudpickle.loads(resp.error)
TypeError: Could not serialize the put value <class 'ray.util.multiprocessing.pool.PoolActor'>:
=============================================================================
Checking Serializability of <class 'ray.util.multiprocessing.pool.PoolActor'>

As you would guess, the error only occurs when connecting to a remote cluster, if no address is passed to ray.init there's no problem. Any idea how to solve this?

Versions / Dependencies

Ray: 2.10.0
Python: 3.8.13
OS: Linux 5.4.0-1109~18.04.1-Ubuntu SMP x86_64

Reproduction script

import numpy as np
import ray
from sklearn.datasets import load_digits
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC

digits = load_digits()
param_space = {
    "C": np.logspace(-6, 6, 30),
    "gamma": np.logspace(-8, 8, 30),
    "tol": np.logspace(-4, -1, 30),
    "class_weight": [None, "balanced"],
}
model = SVC(kernel="rbf")
search = RandomizedSearchCV(model, param_space, cv=5, n_iter=300, verbose=10)

ray.init(address="ray://<cluster IP>")

import joblib
from ray.util.joblib import register_ray

register_ray()
with joblib.parallel_backend("ray"):
    search.fit(digits.data, digits.target)

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tcommunity-backlogcoreIssues that should be addressed in Ray Corecore-clientray client related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions