Skip to content

Conversation

kryanbeane
Copy link
Contributor

@kryanbeane kryanbeane commented Sep 1, 2025

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job and let it run to completion
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 1, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted

Verification steps

TODO

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

codecov bot commented Sep 1, 2025

Codecov Report

❌ Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.19%. Comparing base (47122ee) to head (1637cfd).
⚠️ Report is 3 commits behind head on ray-jobs-feature.

Files with missing lines Patch % Lines
src/codeflare_sdk/ray/rayjobs/rayjob.py 97.29% 1 Missing ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##           ray-jobs-feature     #896      +/-   ##
====================================================
+ Coverage             93.48%   94.19%   +0.70%     
====================================================
  Files                    21       21              
  Lines                  1889     1911      +22     
====================================================
+ Hits                   1766     1800      +34     
+ Misses                  123      111      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pawelpaszki
Copy link
Contributor

@kryanbeane will you modify existing e2e test(s) to include the new functionality?

@kryanbeane kryanbeane force-pushed the stop-and-delete-rayjobs branch from f5f241a to 2724f52 Compare September 4, 2025 11:09
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

TODO

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kryanbeane kryanbeane force-pushed the stop-and-delete-rayjobs branch from 2724f52 to 1f1720e Compare September 4, 2025 13:56
@kryanbeane kryanbeane force-pushed the stop-and-delete-rayjobs branch 2 times, most recently from 3d548c8 to 3ea43d9 Compare September 4, 2025 14:07
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@kryanbeane: This pull request references RHOAIENG-27792 which is a valid jira issue.

In response to this:

Issue link

RHOAIENG-27792

What changes have been made

  • Added job.stop(), job.resubmit(), and job.delete()
  • Added logic to automatically tear down the training script config map when a Training job is deleted
  • Updated E2E test

Verification steps

job.stop()

Verify manually that the job has been suspended (spec.suspend = true) and that the RayCluster was torn down

  • Resubmit the Job and let it run to completion
job.resubmit()
  • Delete the Job CR
job.delete()

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pawelpaszki
Copy link
Contributor

I have verified the changes while running the notebook and executed the test locally (with increased timeout - as per my comment). it works good. Thanks!

@kryanbeane
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2025
@kryanbeane kryanbeane force-pushed the stop-and-delete-rayjobs branch 3 times, most recently from 0576c2f to 3f4c7cc Compare September 5, 2025 10:33
@kryanbeane kryanbeane force-pushed the stop-and-delete-rayjobs branch from 3f4c7cc to 1637cfd Compare September 8, 2025 10:11
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2025
Copy link
Contributor

openshift-ci bot commented Sep 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pawelpaszki

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2025
@kryanbeane
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 538d345 into project-codeflare:ray-jobs-feature Sep 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants