Skip to content

Conversation

NeilGirdhar
Copy link
Contributor

@NeilGirdhar NeilGirdhar commented May 6, 2025

Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are still testing with arrow v9 on CI?

- label: ":database: data: arrow v9 tests (data_non_parallel)"

those test jobs need to be removed, right?

@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label May 7, 2025
@NeilGirdhar NeilGirdhar closed this May 7, 2025
@NeilGirdhar NeilGirdhar reopened this May 7, 2025
@NeilGirdhar
Copy link
Contributor Author

@aslonnie What do I need to do to complete this?

@richardliaw
Copy link
Contributor

Hi Neil, we're discussing exactly the right arrow version we're planning to jump to. 17 is quite a large jump that would cause issues with a lot of production deployments. Perhaps something like 12 or 13 is better (1.5 year lag).

@NeilGirdhar
Copy link
Contributor Author

Sounds good. I just wanted the linked problem to go away, so I wasn't sure what to do 😄

@alexeykudinkin
Copy link
Contributor

alexeykudinkin commented May 27, 2025

Hey, @NeilGirdhar!

Thank you for your effort and contribution! We're certainly not looking to drop the support for older versions of Pyarrow as that would leave a lot of folks using Ray & Data behind.

Can you help us understand what specifically is an issue you're trying to work around (i checked the linked ticket)

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented May 29, 2025

@alexeykudinkin

Can you help us understand what specifically is an issue you're trying to work around (i checked the linked ticket)

It's the issue I linked at the top of the issue:

See astral-sh/uv#13315 (comment)

Is something about that issue unclear?

Read this comment in particular. Essentially, you're supporting so many versions of PyArrow that uv can't resolve the dependencies properly. The easiest fix would be to raise your lower bound.

Also, I don't understand your logic about why "that would leave a lot of folks using Ray & Data behind." They can still use older versions of Ray, or they can upgrade PyArrow. Does that make life harder for many users? See the release history and the news about releases.

Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 13, 2025
@NeilGirdhar
Copy link
Contributor Author

Not stale :)

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 14, 2025
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 28, 2025
@NeilGirdhar
Copy link
Contributor Author

not stale

@github-actions github-actions bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 28, 2025
@alexeykudinkin
Copy link
Contributor

Read astral-sh/uv#13315 (comment) in particular. Essentially, you're supporting so many versions of PyArrow that uv can't resolve the dependencies properly. The easiest fix would be to raise your lower bound.

That's the problem with uv though, not Ray, right?

We occasionally raise min supported version where's strong enough reasons for us to do so to minimize complexity, but doing so just to make it easier for uv to resolve dependencies doesn't seem to be strong enough justification IMO.

@aslonnie
Copy link
Collaborator

aslonnie commented Jul 8, 2025

Essentially, you're supporting so many versions of PyArrow that uv can't resolve the dependencies properly. The easiest fix would be to raise your lower bound.

one can add addtional custom constraint to help uv (or any dependency manager) to narrow the search range. uv (or a popular dependency manager) can also pick a search base input as hint.

it is not ray library's job to narrow the version constraint scope

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Jul 8, 2025

That's the problem with uv though, not Ray, right?

No. Other resolvers (like Poetry) also can't seem to resolve the dependencies.

Do you know of any resolver that does work?

one can add addtional custom constraint to help uv (or any dependency manager) to narrow the search range. uv (or a popular dependency manager) can also pick a search base input as hint.

it is not ray library's job to narrow the version constraint scope

That may be, and I don't know how the resolvers work, but either the resolvers need to change, or this project does. Because the situation right now is basically broken.

@aslonnie
Copy link
Collaborator

aslonnie commented Jul 8, 2025

That may be, and I don't know how the resolvers work, but either the resolvers need to change, or this project does. Because the situation right now is basically broken.

I understand your frustration. that said, we do not want to keep supporting pyarrow 9. we are still serving users who are using them with new releases.

respectfully, resolving dependency is an NP-hard problem.. this means that when resolution problem becomes hard, the algorithm has to stop somewhere practically, and it does not guarantee a result. it is neither the package maintainer's job nor the dependency resolver's job to guarantee a resolution. ultimately, it is the user's job.

for ray, python 3.13 support is still experimental, and we are working on providing a recommended version set in the upcoming quarter. not sure if it will help though. for other python versions, it is roughly the requirements_compiled.txt file :

https://github.com/ray-project/ray/blob/releases/2.47.1/python/requirements_compiled.txt

which is also saved and used in our released container images.

for your specific use case, you can try add pyarrow>=17.0 in the dependencies (or some other version), as it is also mentioned in the original thread. providing a hint there for the resolver is arguably the right thing to do. (in fact, I would argue that one should also always check in the uv generated lock file for uv sync to consume).

fwiw, the code change in this PR is purely python code change now. the dependency resolving happens only with package meta data. so the python code change in this PR right now does not have any effect on the dependency resolving.

reading the context again, I think the main issue is that ray probably should drop the line of:

"pyarrow <18; sys_platform == 'darwin' and platform_machine == 'x86_64'",

maybe after dropping that uv can be freed to prioritize trying the higher pyarrow versions.

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Jul 8, 2025

respectfully, resolving dependency is an NP-hard problem.. this means that when resolution problem becomes hard, the algorithm has to stop somewhere practically, and it does not guarantee a result. it is neither the package maintainer's job nor the dependency resolver's job to guarantee a resolution. ultimately, it is the user's job.

I understand your point, but putting this on the user means a lot of unnecessary churn whenever new versions of packages come out. In an ideal world, the dependency resolution would just work without the user having to think about dependencies of dependencies.

maybe after dropping that uv can be freed to prioritize trying the higher pyarrow versions.

Yes, that would work too.

@aslonnie
Copy link
Collaborator

aslonnie commented Jul 8, 2025

#54405 is merged, so is it okay to close this PR now?

@NeilGirdhar NeilGirdhar closed this Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community data Ray Data-related issues stability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uv sync with project using Ray fails installing on Python 3.13
6 participants