-
Notifications
You must be signed in to change notification settings - Fork 416
Qwen3 sft collab #2355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qwen3 sft collab #2355
Conversation
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
68582a2
to
e1cb7e4
Compare
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
Signed-off-by: Vladimir Suvorov <[email protected]>
"print(f\"MaxText Home directory (from Python): {MAXTEXT_REPO_ROOT}\")\n", | ||
"\n", | ||
"DEBUG = False # set to True to run in debug mode, for more print statements\n", | ||
"#set this to the path of the checkpoint you want to load, gs:// supported \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 118-121 is confusing. Let's simplify like this:
# Case 1: Set `MODEL_CHECKPOINT_PATH` to GCS path that already has `Qwen3-0.6B` model checkpoint
# Case 2: If you do not have the checkpoint, then do not update `MODEL_CHECKPOINT_PATH`
# and this colab will download the checkpoint from HF and store at `"{MAXTEXT_REPO_ROOT}/qwen_checkpoint\"`
"MODEL_CHECKPOINT_PATH = f\"{MAXTEXT_REPO_ROOT}/qwen_checkpoint\""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, have you tried setting MODEL_CHECKPOINT_PATH
to a GCS location? Do you see any permission issue to connect to GCS?
"source": [ | ||
"# This is the command to convert the HF model to the MaxText format \n", | ||
"# You may omit it if you already have a checkpoint\n", | ||
"!python3 -m MaxText.utils.ckpt_conversion.to_maxtext \\\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command should only run when MODEL_CHECKPOINT_PATH = f\"{MAXTEXT_REPO_ROOT}/qwen_checkpoint
. Put this behind a flag.
" \"dtype=bfloat16\",\n", | ||
" \"hf_path=HuggingFaceH4/ultrachat_200k\", # HuggingFace dataset/model if needed\n", | ||
" f\"hf_access_token={HF_TOKEN}\",\n", | ||
" \"base_output_directory=/tmp/maxtext_qwen06\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set base_output_directory
to a GCS path, so that users could access the fine-tuned checkpoint?
Also, add a print statement after train()
, that you can find your fine-tuned checkpoint at this path.
Description
Qwen3 sft collab - an SFT collab with Qwen3-0.6B that can run on public collab 5e-1
Tests
Run the collab. Try yourself =)
Checklist
Before submitting this PR, please make sure (put X in square brackets):