Skip to content

Commit 5d2eb3a

Browse files
committed
feat(RHOAIENG-26488): add lifecycled RayCluster demo notebook for RayJobs
1 parent 5a77f7b commit 5d2eb3a

File tree

1 file changed

+234
-0
lines changed

1 file changed

+234
-0
lines changed
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "9259e514",
6+
"metadata": {},
7+
"source": [
8+
"# Submitting a RayJob which lifecycles its own RayCluster\n",
9+
"\n",
10+
"In this notebook, we will go through the basics of using the SDK to:\n",
11+
" * Define a RayCluster configuration\n",
12+
" * Use this configuration alongside a RayJob definition\n",
13+
" * Submit the RayJob, and allow Kuberay Operator to lifecycle the RayCluster for the RayJob"
14+
]
15+
},
16+
{
17+
"cell_type": "markdown",
18+
"id": "18136ea7",
19+
"metadata": {},
20+
"source": [
21+
"## Defining and Submitting the RayJob"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "a1c2545d",
27+
"metadata": {},
28+
"source": [
29+
"First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell."
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"id": "51e18292",
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"from codeflare_sdk import RayJob, ManagedClusterConfig, TokenAuthentication"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"id": "649c5911",
45+
"metadata": {},
46+
"source": [
47+
"Execute the below cell to authenticate the notebook via OpenShift.\n",
48+
"\n",
49+
"**TODO: Add guide to authenticate locally.**"
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": null,
55+
"id": "dc364888",
56+
"metadata": {},
57+
"outputs": [],
58+
"source": [
59+
"auth = TokenAuthentication(\n",
60+
" token = \"XXXXX\",\n",
61+
" server = \"XXXXX\",\n",
62+
" skip_tls=False\n",
63+
")\n",
64+
"auth.login()"
65+
]
66+
},
67+
{
68+
"cell_type": "markdown",
69+
"id": "5581eca9",
70+
"metadata": {},
71+
"source": [
72+
"Next we'll need to define the ManagedClusterConfig. Kuberay will use this to spin up a short-lived RayCluster that will only exist as long as the job"
73+
]
74+
},
75+
{
76+
"cell_type": "code",
77+
"execution_count": null,
78+
"id": "3094c60a",
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"cluster_config = ManagedClusterConfig(\n",
83+
" num_workers=2,\n",
84+
" worker_cpu_requests=1,\n",
85+
" worker_cpu_limits=1,\n",
86+
" worker_memory_requests=4,\n",
87+
" worker_memory_limits=4,\n",
88+
" head_accelerators={'nvidia.com/gpu': 0},\n",
89+
" worker_accelerators={'nvidia.com/gpu': 0},\n",
90+
")"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"id": "02a2b32b",
96+
"metadata": {},
97+
"source": [
98+
"Lastly we can pass the ManagedClusterConfig into the RayJob and submit it. You do not need to worry about tearing down the cluster when the job has completed, that is handled for you!"
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"id": "e905ccea",
105+
"metadata": {},
106+
"outputs": [],
107+
"source": [
108+
"job = RayJob(\n",
109+
" name=\"demo-rayjob\",\n",
110+
" entrypoint=\"python -c 'print(\\\"Hello from RayJob!\\\")'\",\n",
111+
" cluster_config=cluster_config,\n",
112+
" namespace=\"your-namespace\"\n",
113+
")\n",
114+
"\n",
115+
"job.submit()"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"id": "f3612de2",
121+
"metadata": {},
122+
"source": [
123+
"We can check the status of our cluster by executing the below cell. If it's not up immediately, run the cell a few more times until you see that it's in a 'running' state."
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": null,
129+
"id": "96d92f93",
130+
"metadata": {},
131+
"outputs": [],
132+
"source": [
133+
"job.status()"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"id": "a0e2a650",
139+
"metadata": {},
140+
"source": [
141+
"## Creating and Submitting the RayJob"
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"id": "4cf03419",
147+
"metadata": {},
148+
"source": [
149+
"Now we can create the RayJob that we want to submit against the running cluster. The process is quite similar to how we initialize and apply the cluster. \n",
150+
"In this context, we need to use the `cluster_name` variable to point it to our existing cluster.\n",
151+
"\n",
152+
"For the sake of demonstration, the job we'll submit via the `entrypoint` is a single python command. In standard practice this would be pointed to a python training script.\n",
153+
"\n",
154+
"We'll then call the `submit()` function to run the job against our cluster.\n",
155+
"\n",
156+
"You can run the below cell to achieve this."
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": null,
162+
"id": "94edca70",
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
166+
"rayjob = RayJob(\n",
167+
" job_name=\"sdk-test-job\",\n",
168+
" cluster_name=\"rayjob-cluster\",\n",
169+
" namespace=\"rhods-notebooks\",\n",
170+
" entrypoint=\"python -c 'import time; time.sleep(20)'\",\n",
171+
")\n",
172+
"\n",
173+
"rayjob.submit()"
174+
]
175+
},
176+
{
177+
"cell_type": "markdown",
178+
"id": "30a8899a",
179+
"metadata": {},
180+
"source": [
181+
"We can observe the status of the RayJob in the same way as the RayCluster by invoking the `submit()` function via the below cell."
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": null,
187+
"id": "3283b09c",
188+
"metadata": {},
189+
"outputs": [],
190+
"source": [
191+
"rayjob.submit()"
192+
]
193+
},
194+
{
195+
"cell_type": "markdown",
196+
"id": "9f3c9c9f",
197+
"metadata": {},
198+
"source": [
199+
"This function will output different tables based on the RayJob's current status. You can re-run the cell multiple times to observe the changes as you need to. Once you've observed that the job has been completed, you can shut down the cluster we created earlier by executing the below cell."
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": null,
205+
"id": "5b11e379",
206+
"metadata": {},
207+
"outputs": [],
208+
"source": [
209+
"cluster.down()"
210+
]
211+
}
212+
],
213+
"metadata": {
214+
"kernelspec": {
215+
"display_name": "Python 3",
216+
"language": "python",
217+
"name": "python3"
218+
},
219+
"language_info": {
220+
"codemirror_mode": {
221+
"name": "ipython",
222+
"version": 3
223+
},
224+
"file_extension": ".py",
225+
"mimetype": "text/x-python",
226+
"name": "python",
227+
"nbconvert_exporter": "python",
228+
"pygments_lexer": "ipython3",
229+
"version": "3.11.11"
230+
}
231+
},
232+
"nbformat": 4,
233+
"nbformat_minor": 5
234+
}

0 commit comments

Comments
 (0)