-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Updating Daft links in Ray documentation #54328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7b2424c
to
72c6bd2
Compare
7d67544
to
39a2b33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from the docs side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the microcheck failed because of the updated pip dependency? https://buildkite.com/ray-project/microcheck/builds/19843#0197f08b-3255-4fd3-b8b1-0c756dd4c467. Probably just need to recompile / update requirements_compiled.txt
: requirements_compiled.txt is not up to date. Please download it from Artifacts tab and git push the changes.
"Dask on Ray" (DoR) is broken in dask==2024.11.0 or later as reported in ray-project#48689 because Dask removed a private function in dask/dask#11378 that DoR has been relying on. Not only dask/dask#11378, Dask has migrated their task data structure to a new format (the high-level motivation is described in dask/dask#9969). Since this migration spans across a series of changes between 2024.11.0 and 2025.1.0, it's not realistic to copy what's been removed and paste them in Ray. This change adapts Dask on Ray to the change to keep its functionality. The change is compatible only with `dask>=2024.11.0,<2025.1.0` because Dask made another major change in 2025.1.0, breaking the shuffle optimization introduced in ray-project#13951 Signed-off-by: Lonnie Liu <[email protected]> Co-authored-by: Hiromu Hota <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…ct#54312) The semaphore is clever, but using signal instead for consistency with other tests. --------- Signed-off-by: Edward Oakes <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
ray-project#53999 resulted in this test being flaky on mac. This test's purpose seems to be similar to https://github.com/ray-project/ray/blob/986115ce566fda437c5e3fcca3705c225b06f3b8/python/ray/tests/test_streaming_generator_4.py#L73 and was kind of trying to test a feature that didn't exist. But since it wasn't actually pausing the generator for backpressure, the generator would usually finish before the node removal actually happens. Now sometimes when the node removal happens before the generator finishes, we'll lose objects and go through the new path. We could also go through the new resubmission path multiple times for one node death because multiple objects from the same generator may be marked lost. Therefore, sometimes we run out of retries before getting to the third retry in the test and it fails with `ray.exceptions.RayTaskError(ObjectReconstructionFailedMaxAttemptsExceededError)` The fix to make this not flaky would be to do the follow up listed in the previous pr. > Currently, if multiple objects from the same generator are queued up to be recovered when the recovery periodical runner runs, we could resubmit for the first object and then once again queue up a resubmit for the second if argument resolution and sequence numbering lines up. Since this doesn't actually affect correctness and requires a bit of refactoring, it'll be in a follow-up PR. --------- Signed-off-by: dayshah <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…ay 1.4.0 (ray-project#53943) but that if autoscaling is used, the autoscaler image must have Ray 2.45.0 or later. closes ray-project/kuberay#3580 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: David Xia <[email protected]> Co-authored-by: angelinalg <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
## Problem This check assumes that if the actor is not in `registered_actors_` it must be in `actor_to_register_callbacks_`. https://github.com/ray-project/ray/blob/c6d7d7eaa1e4dd0fd42ba45891d8501ab14ceb44/src/ray/gcs/gcs_server/gcs_actor_manager.cc#L871-L872 This isn't true because in `DestroyActor`, the actor is always removed from `actor_to_register_callbacks_`, https://github.com/ray-project/ray/blob/c6d7d7eaa1e4dd0fd42ba45891d8501ab14ceb44/src/ray/gcs/gcs_server/gcs_actor_manager.cc#L1073 but only removed from `registered_actors_` if the actor is restartable. https://github.com/ray-project/ray/blob/c6d7d7eaa1e4dd0fd42ba45891d8501ab14ceb44/src/ray/gcs/gcs_server/gcs_actor_manager.cc#L1143-L1146 It's also erasing from `actor_to_register_callbacks_` without actually calling the callbacks, so rpc's from the core worker side could be left hanging forever. ## Fix The fix is to never erase from `actor_to_register_callbacks_` in `DestroyActor`, and to always respond to the all the queued callbacks with the appropriate status in the Put callback. The logic for which status to respond with is the same. If it's in `registered_actors_` after the table put is done the rpc should be completed with the ok status. If it's not there by the time the table put is done it, it should respond to the rpc with SchedulingCancelled. Additional changes - removed the need to pass the actor ptr into the RegisterActor callback because it was unused - removed an accessor that was only used for tests, added it to the test fixture and turned the test fixture into a friend - add logic in the accessor to read both the gRPC status and the GcsStatus see comment here ray-project#53634 (comment) ### Follow ups - Fix the mess noted here ray-project#53634 (comment) - There's almost surely more lurking issues here due to actor management split brain + kv operation ordering assumptions. Needs to be investigated. - The actor state transition machine here needs to be clearer, e.g. actors shouldn't be put into registered_actors_ if they're not registered yet. --------- Signed-off-by: dayshah <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Move `httpx` out of `test_utils` because for some reason it is not available in the image used for `test_runtime_env_container. Signed-off-by: Cindy Zhang <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ccmao1130 <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ChanChan Mao <[email protected]>
…t#54317) This PR is smaller than it looks. The `TaskManager` class currently exposes two interfaces: `TaskFinisher` and `TaskResubmission`. While these interfaces are well-intentioned, they are only implemented by `TaskManager` itself, and the methods they define are not fully independent. As a result, it’s unlikely that these interfaces could be meaningfully separated or implemented in isolation. This change consolidates them into a single `TaskManager` interface, which can be reused where needed. The goal is to reduce the number of concepts and components required to reason about the Ray core, and to simplify the overall design. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Adding uv binary to be used in CI --------- Signed-off-by: elliot-barn <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: Linkun <[email protected]> Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…and task detail pages (ray-project#54292) Follow-up to ray-project#53423 Missed a few places in the UI. Also updates placement group tables to use the same code preview component as the actor and tasks tables. Placement group table   Actor detail  Task detail  --------- Signed-off-by: Alan Guo <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
See inline comments for each. --------- Signed-off-by: Edward Oakes <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…#54413) Found that we pass by value in cluster task manager constructor, use move to avoid unnecessary copy. Signed-off-by: You-Cheng Lin (Owen) <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Fixes: ray-project#53478 - migrating check_library_usage_telemetry from `_private` to `_common` - migrating TelemetryCallsite from `_private` to `_common`. Signed-off-by: ChanChan Mao <[email protected]>
…Y_enable_autoscaler_v2 for ray up (ray-project#54456) Use a different env var for ray up to enable autoscaler v2 to avoid accidentally enabling v2 due to env inheritance. Signed-off-by: Rueian <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…#54467) Signed-off-by: ChanChan Mao <[email protected]>
…ation (ray-project#53647) ## Why are these changes needed? - Currently, Serve can not catch multiple FastAPI deployments in a single application if user sets the docs path to None in their FastAPI app. - We can check multiple ASGIAppReplicaWrapper in a single application to avoid this issue. ## Related issue number Closes ray-project#53024 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Ziy1-Tan <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
…t#54458) Deflake `test_autoscaling_policy_with_metr_disab.py::TestAutoscalingMetrics::test_basic` When `RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0`, we collect ongoing request metrics at the replica and queued request metrics at the handle -- but ongoing request metrics are updated very fast while queued metrics are sent every 10s. Because of this delay the total number of ongoing requests climbs to almost 100 because before the queued request metrics are flushed, almost every request is double counted. Example: https://buildkite.com/ray-project/postmerge/builds/11322#0197eaca-62e1-457d-947b-a981210e98b9/177-852 Note that we are sending exactly 50 requests and expect the number of replicas to scale to exactly 5. However the metrics grow above 50 here, almost to 100, which causes the test to be flaky / fail. This pr sets the env var `RAY_SERVE_HANDLE_METRIC_PUSH_INTERVAL_S=0.1` and pairs with other stabilizing changes. Signed-off-by: Cindy Zhang <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
This PR replaces some of the manual string literals of urls within `test_api`, `test_deploy`, `test_deploy_2`, `test_deploy_app`, `test_failure` with `get_application_urls` and splits some of the tests into separate files. --------- Signed-off-by: doyoung <[email protected]> Signed-off-by: Alexey Kudinkin <[email protected]> Signed-off-by: Linkun <[email protected]> Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: Kevin H. Luu <[email protected]> Signed-off-by: kevin <[email protected]> Signed-off-by: elliot-barn <[email protected]> Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: dayshah <[email protected]> Signed-off-by: You-Cheng Lin (Owen) <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Signed-off-by: Srinath Krishnamachari <[email protected]> Signed-off-by: srinathk10 <[email protected]> Signed-off-by: noemotiovon <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Cindy Zhang <[email protected]> Signed-off-by: Hao Chen <[email protected]> Signed-off-by: Timothy Seah <[email protected]> Signed-off-by: Timothy Seah <[email protected]> Signed-off-by: Vignesh Hirudayakanth <[email protected]> Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]> Co-authored-by: Linkun <[email protected]> Co-authored-by: kourosh hakhamaneshi <[email protected]> Co-authored-by: Qiaolin Yu <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Elliot Barnwell <[email protected]> Co-authored-by: Lonnie Liu <[email protected]> Co-authored-by: harshit-anyscale <[email protected]> Co-authored-by: Dhyey Shah <[email protected]> Co-authored-by: Owen Lin (You-Cheng Lin) <[email protected]> Co-authored-by: Edward Oakes <[email protected]> Co-authored-by: srinathk10 <[email protected]> Co-authored-by: Chenguang Li <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Cindy Zhang <[email protected]> Co-authored-by: Hao Chen <[email protected]> Co-authored-by: Timothy Seah <[email protected]> Co-authored-by: Timothy Seah <[email protected]> Co-authored-by: Justin Yu <[email protected]> Co-authored-by: Vignesh Hirudayakanth <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>
Signed-off-by: ChanChan Mao <[email protected]>
hm looks like the commit history got messed up; open a new PR? |
omg i know.. sorry i'm not a developer let me redo this 😭 |
Why are these changes needed?
Want to update Daft links, messaging, and logo across Ray documentation
Related issue number
n/a
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.