Update Safety Evals #850

mgmorgan23 · 2025-08-04T22:43:30Z

This PR updates the call to safety evaluations in submit_eval_jobs.py by:

Rerouting it through oe-eval for consistency
Adding a --run_safety_evaluations_reasoning option that runs the thinker safety eval suite through oe-eval
Adding the HF_Token to the gantry args passed in oe-eval.sh (necessary in order to use allenai/wildguard as a classifier in the safety evals)

The beaker image maliam/merge-safety-evals-0804-2 is compatible with this script, built off of this branch: https://github.com/allenai/oe-eval-internal/tree/maliam-add-safety-eval and this fork: https://github.com/mgmorgan23/safety-eval-fork

Example call:
python scripts/submit_eval_jobs.py --model_name hf-open-thoughts-open-thinker3-7B --location open-thoughts/OpenThinker3-7B --is_tuned --evaluate_on_weka --workspace "tulu-3-results" --priority low --preemptible --beaker_image maliam/merge-safety-evals-0804-2 --use_hf_tokenizer_template --oe_eval_tasks "ifeval::tulu" --use_alternate_safety_image maliam/merge-safety-evals-0804-2 --skip_oi_evals --run_safety_evaluations_reasoning

hamishivi · 2025-08-06T15:53:32Z

If we are consolidating to oe-eval, is it possible to just remove the existing safety eval code and instead rely on the pre-existing oe-eval code? I understand the want to not touch the old logic but I think its cleaner if we just have this one single oe-eval setup, and not do something special for safety evals.

And maybe you can add a SAFETY_EVAL preset like this: https://github.com/allenai/open-instruct/blob/main/scripts/eval/oe-eval.sh#L142
and edit here accordingly: https://github.com/allenai/open-instruct/blob/main/scripts/submit_eval_jobs.py#L113

…ai/open-instruct into maliam-update-safety-evals

mgmorgan23 · 2025-08-07T18:25:24Z

I updated the logic to use the oe-eval task suite -- the call for a reasoning model is now:

python scripts/submit_eval_jobs.py --model_name hf-open-thoughts-open-thinker3-7B --location open-thoughts/OpenThinker3-7B --is_tuned --evaluate_on_weka --workspace "tulu-3-results" --priority low --preemptible --beaker_image maliam/merge-safety-evals-0806 --use_hf_tokenizer_template --run_oe_eval_experiments --oe_eval_task_suite "SAFETY_EVAL_REASONING" --skip_oi_evals

And for a non-reasoning model:

python scripts/submit_eval_jobs.py --model_name hf-allenai-llama-3-tulu-2-8b --location allenai/llama-3-tulu-2-8b --is_tuned --evaluate_on_weka --workspace "tulu-3-results" --priority low --preemptible --beaker_image maliam/merge-safety-evals-0806 --use_hf_tokenizer_template --run_oe_eval_experiments --oe_eval_task_suite "SAFETY_EVAL" --skip_oi_evals

hamishivi

One minor comment. We should wait to merge this until the oe-eval side of things is done (and then test before merging).

hamishivi · 2025-08-07T18:53:40Z

scripts/submit_eval_jobs.py

-    # tested reasonably extensively with 70B
-    if num_gpus > 1:
-        num_gpus *= 2
+    if args.oe_eval_task_suite == 'SAFETY_EVAL' or args.oe_eval_task_suite == 'SAFETY_EVAL_REASONING':


Discussed offline.
Let's remove the custom logic here and make it inline with everything else, but then for SAFETY_EVAL_REASONING specifically just double the num gpus.

…ai/open-instruct into maliam-update-safety-evals

hamishivi

Happy to merge this once oe-eval PR is merged! It would also be super useful if you do a quick test run with submit_eval_jobs before merging and link the succesful running jobs (one regular one reasoner).

mgmorgan23 and others added 5 commits July 25, 2025 17:03

Update script to call new oe-eval safety evals

7479bed

Add num gpu constraints

50ab0de

Add handling for alternative safety beaker image

0bc9247

typos in script, add hf key to gantry args

460a874

Merge branch 'main' into maliam-update-safety-evals

547b121

mgmorgan23 and others added 3 commits August 7, 2025 09:56

move safety eval call into a task suite

995028b

Merge branch 'maliam-update-safety-evals' of https://github.com/allen…

e0556fe

…ai/open-instruct into maliam-update-safety-evals

Merge branch 'main' into maliam-update-safety-evals

1333eb3

hamishivi reviewed Aug 7, 2025

View reviewed changes

mgmorgan23 added 2 commits August 7, 2025 13:41

update num_gpu calculation

8a1b5df

Merge branch 'maliam-update-safety-evals' of https://github.com/allen…

8d0f5d5

…ai/open-instruct into maliam-update-safety-evals

hamishivi approved these changes Aug 8, 2025

View reviewed changes

typo

15f25dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Safety Evals #850

Update Safety Evals #850

Uh oh!

mgmorgan23 commented Aug 4, 2025

Uh oh!

hamishivi commented Aug 6, 2025

Uh oh!

mgmorgan23 commented Aug 7, 2025

Uh oh!

hamishivi left a comment

Uh oh!

hamishivi Aug 7, 2025

Uh oh!

hamishivi left a comment

Uh oh!

Uh oh!

Update Safety Evals #850

Are you sure you want to change the base?

Update Safety Evals #850

Uh oh!

Conversation

mgmorgan23 commented Aug 4, 2025

Uh oh!

hamishivi commented Aug 6, 2025

Uh oh!

mgmorgan23 commented Aug 7, 2025

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!