Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Sep 18, 2025

Fixes #677.

Experiments:

  1. Debug run: Beaker
  2. Tool run: Beaker
  3. Multi-node run: Beaker
  4. Finetune run: Beaker

@finbarrtimbers finbarrtimbers marked this pull request as ready for review September 19, 2025 17:13
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm running this I get

Traceback (most recent call last):
  File "/weka/oe-adapt-default/hamishi/pr_review/open-instruct/mason.py", line 924, in <module>
    main()
  File "/weka/oe-adapt-default/hamishi/pr_review/open-instruct/mason.py", line 915, in main
    tasks=[make_task_spec(args, full_command, i, beaker_secrets, whoami, args.resumable) for i, full_command in enumerate(full_commands)],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/weka/oe-adapt-default/hamishi/pr_review/open-instruct/mason.py", line 915, in <listcomp>
    tasks=[make_task_spec(args, full_command, i, beaker_secrets, whoami, args.resumable) for i, full_command in enumerate(full_commands)],
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/weka/oe-adapt-default/hamishi/pr_review/open-instruct/mason.py", line 852, in make_task_spec
    spec = beaker.TaskSpec(
           ^^^^^^^^^^^^^^^
AttributeError: module 'beaker' has no attribute 'TaskSpec'

@finbarrtimbers
Copy link
Collaborator Author

finbarrtimbers commented Sep 20, 2025 via email

@hamishivi
Copy link
Collaborator

@finbarrtimbers running a slightly edited version of scripts/train/tulu3/finetune_8b.sh:

python mason.py     --cluster ai2/jupiter-cirrascale-2     --workspace ai2/tulu-3-dev     --priority high     --image nathanl/open_instruct_auto --pure_docker_mode     --preemptible     --num_nodes 8     --budget ai2/oe-adapt     --gpus 8 -- accelerate launch     --mixed_precision bf16     --num_processes 8     --use_deepspeed     --deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf     --deepspeed_multinode_launcher standard     open_instruct/finetune.py     --exp_name tulu3_8b_sft     --model_name_or_path meta-llama/Llama-3.1-8B     --model_revision main     --tokenizer_name meta-llama/Llama-3.1-8B     --tokenizer_revision main     --use_slow_tokenizer     --dataset_mixer_list allenai/tulu-3-sft-mixture 512     --max_seq_length 4096     --per_device_train_batch_size 1     --gradient_accumulation_steps 2     --learning_rate 5e-06     --lr_scheduler_type linear     --warmup_ratio 0.03     --weight_decay 0.0     --num_train_epochs 2     --reduce_loss sum     --use_flash_attn     --gradient_checkpointing     --report_to wandb     --with_tracking     --logging_steps 1     --seed 8

(edited just to reduce the dataset size)

- Changed beaker_client.workspace.secrets() to beaker_client.secret.list()
- Changed beaker_client.account.whoami() to beaker_client.user.get()
- Changed beaker.ExperimentSpec to beaker.BeakerExperimentSpec
- Changed beaker.Constraints to beaker.BeakerConstraints
- Changed beaker.RetrySpec to beaker.BeakerRetrySpec
- Changed all beaker.EnvVar to beaker.BeakerEnvVar
- Changed beaker.DataMount to beaker.BeakerDataMount
- Changed beaker.DataSource to beaker.BeakerDataSource
- Changed beaker.TaskResources to beaker.BeakerTaskResources
- Changed beaker.ImageSource to beaker.BeakerImageSource
- Changed beaker.ResultSpec to beaker.BeakerResultSpec
- Changed beaker.TaskContext to beaker.BeakerTaskContext
- Changed beaker.Priority to beaker.BeakerPriority
The experiment.create() method now returns a BeakerWorkload object,
which has an experiment field containing the ID.
- Fixed exception names: ConfigurationError → BeakerConfigurationError, ExperimentNotFound → BeakerExperimentNotFound
- Updated to use workload.get() and experiment.get_spec() instead of experiment.get()
- Changed description update to use workload.update() instead of experiment.set_description()
- Updated test mocks to match the new API structure
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the local debug issue! some other nits.


echo "Using Beaker image: $BEAKER_IMAGE"

uv run python mason.py \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make using beaker optional for this script? I think for debug scripts its useful to run both in interactive and in beaker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down; how do I make it optional?

cursor[bot]

This comment was marked as outdated.

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Oct 10, 2025
Merged via the queue into main with commit 50cd847 Oct 10, 2025
4 checks passed
@finbarrtimbers finbarrtimbers deleted the update-beaker-py branch October 10, 2025 20:29
finbarrtimbers added a commit that referenced this pull request Oct 15, 2025
* Updates beaker-py version.

* Fixed secrets ref.

* Fixed whoami command

* Fixed experimentspec error

* Fixed constraints reference.

* Update mason.py for beaker-py v2 API changes

- Changed beaker_client.workspace.secrets() to beaker_client.secret.list()
- Changed beaker_client.account.whoami() to beaker_client.user.get()
- Changed beaker.ExperimentSpec to beaker.BeakerExperimentSpec
- Changed beaker.Constraints to beaker.BeakerConstraints
- Changed beaker.RetrySpec to beaker.BeakerRetrySpec

* Fix beaker.TaskSpec to beaker.BeakerTaskSpec for v2 API

* Fix remaining beaker v2 API changes

- Changed all beaker.EnvVar to beaker.BeakerEnvVar
- Changed beaker.DataMount to beaker.BeakerDataMount
- Changed beaker.DataSource to beaker.BeakerDataSource
- Changed beaker.TaskResources to beaker.BeakerTaskResources
- Changed beaker.ImageSource to beaker.BeakerImageSource
- Changed beaker.ResultSpec to beaker.BeakerResultSpec
- Changed beaker.TaskContext to beaker.BeakerTaskContext
- Changed beaker.Priority to beaker.BeakerPriority

* Fix BeakerPriority to BeakerJobPriority for v2 API

* Fix BeakerJobPriority enum access to use string key

* Fix experiment ID access for beaker v2 API

The experiment.create() method now returns a BeakerWorkload object,
which has an experiment field containing the ID.

* Fix beaker v2 API compatibility in utils.py and test_utils.py

- Fixed exception names: ConfigurationError → BeakerConfigurationError, ExperimentNotFound → BeakerExperimentNotFound
- Updated to use workload.get() and experiment.get_spec() instead of experiment.get()
- Changed description update to use workload.update() instead of experiment.set_description()
- Updated test mocks to match the new API structure

* Updated finetune script

* Now, finetune_8b.sh uses uv.

* Updated finetune_8b.sh to work with build_image_and_launch.sh.

* Updated code

* added chat template to script.

* changed priority

* updated priority

* Updated scripts/train/debug/finetune.sh.

* added non-resumable flag.

* Set description for debug finetune script.

* Actually set chat template name.

* Updated env var name

* updated typo

* Updated code

* Updated error handling for interactive

* Updated script

* Refactored finetune.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update to beaker-py 2

3 participants