Skip to content

Conversation

mydatascience
Copy link
Collaborator

@mydatascience mydatascience commented Sep 16, 2025

Description

Qwen3 sft collab - an SFT collab with Qwen3-0.6B that can run on public collab 5e-1

Tests

Run the collab. Try yourself =)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
@mydatascience mydatascience changed the title [DRAFT] Qwen3 sft collab Qwen3 sft collab Sep 19, 2025
"print(f\"MaxText Home directory (from Python): {MAXTEXT_REPO_ROOT}\")\n",
"\n",
"DEBUG = False # set to True to run in debug mode, for more print statements\n",
"#set this to the path of the checkpoint you want to load, gs:// supported \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 118-121 is confusing. Let's simplify like this:

# Case 1: Set `MODEL_CHECKPOINT_PATH` to GCS path that already has `Qwen3-0.6B` model checkpoint
# Case 2: If you do not have the checkpoint, then do not update `MODEL_CHECKPOINT_PATH`
# and this colab will download the checkpoint from HF and store at `"{MAXTEXT_REPO_ROOT}/qwen_checkpoint\"`
"MODEL_CHECKPOINT_PATH = f\"{MAXTEXT_REPO_ROOT}/qwen_checkpoint\""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, have you tried setting MODEL_CHECKPOINT_PATH to a GCS location? Do you see any permission issue to connect to GCS?

"source": [
"# This is the command to convert the HF model to the MaxText format \n",
"# You may omit it if you already have a checkpoint\n",
"!python3 -m MaxText.utils.ckpt_conversion.to_maxtext \\\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command should only run when MODEL_CHECKPOINT_PATH = f\"{MAXTEXT_REPO_ROOT}/qwen_checkpoint. Put this behind a flag.

" \"dtype=bfloat16\",\n",
" \"hf_path=HuggingFaceH4/ultrachat_200k\", # HuggingFace dataset/model if needed\n",
" f\"hf_access_token={HF_TOKEN}\",\n",
" \"base_output_directory=/tmp/maxtext_qwen06\",\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set base_output_directory to a GCS path, so that users could access the fine-tuned checkpoint?
Also, add a print statement after train(), that you can find your fine-tuned checkpoint at this path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants