Setting MPS flag check for bf16 training issue #40216

debasisdwivedy · 2025-08-16T07:31:10Z

What does this PR do?

Adds a check for MPS availability for training BF16 torch_dtype=torch.bfloat16

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
GITHUB_ISSUE_TRANSFORMER
GITHUB_ISSUE_ACCELERATE
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@renet10
@ArthurZucker
@zach-huggingface
@SunMarc
@qgallouedec

I have attached a code sample to that was tested on my mac. Please feel free todo a round of testing from your side.

def train():
    import os
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import LoraConfig, TaskType
    from trl import SFTTrainer,SFTConfig
    from dotenv import load_dotenv
    import json

    torch.mps.empty_cache()

    load_dotenv()
    from huggingface_hub import login
    login(token=os.getenv("HUGGINGFACE_API_TOKEN"))

    model_name = "google/gemma-3-270m-it"
    dataset_name = "<TAKE_ANY_SAMPLE_DATASET>"


    # 1. Load the model and tokenizer
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        attn_implementation='eager',
        token=os.getenv("HUGGINGFACE_API_TOKEN"),
        device_map='mps',
        torch_dtype=torch.bfloat16,
    )

    # Use AutoTokenizer to add special tokens
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        token=os.getenv("HUGGINGFACE_API_TOKEN"),
    )

    ##############################################################################################################

    from datasets import load_dataset

    def preprocess(sample):
        messages = sample["messages"]
        return {"text": tokenizer.apply_chat_template(messages, add_generation_prompt=False,tokenize=False)}


    dataset  = load_dataset(dataset_name,split="train[:500]")
    dataset = dataset.rename_column("conversations", "messages")
    
    dataset = dataset.map(preprocess, remove_columns="messages")
    dataset = dataset.train_test_split(0.1)
    print(dataset)

    print(dataset["train"][5]["text"])


##############################################################################################################

    username="JOHN_DOE"# REPLACE with your Hugging Face username
    output_dir = "gemma-3-270m-it-fine-tuned" # The directory where the trained model checkpoints, logs, and other artifacts will be saved. It will also be the default name of the model when pushed to the hub if not redefined later.
    per_device_train_batch_size = 2
    per_device_eval_batch_size = 2
    gradient_accumulation_steps = 8
    logging_steps = 5
    learning_rate = 1e-5 # The initial learning rate for the optimizer.

    max_grad_norm = 1.0
    num_train_epochs=1
    warmup_ratio = 0.1
    lr_scheduler_type = "cosine"
    max_seq_length = 1024

    # 3. Configure PEFT with LoraConfig
    # Crucially, include the embedding layers in modules_to_save

    peft_config = LoraConfig(r=32,
                            lora_alpha=64,
                            lora_dropout=0.05,
                            target_modules=["gate_proj","q_proj","o_proj","k_proj","down_proj","up_proj","v_proj"],
                            modules_to_save=["embed_tokens", "lm_head"],
                            task_type=TaskType.CAUSAL_LM)

    training_arguments = SFTConfig(
        output_dir=output_dir,
        do_train=True,
        per_device_train_batch_size=per_device_train_batch_size,
        do_eval=True,
        per_device_eval_batch_size=per_device_eval_batch_size,
        gradient_accumulation_steps=gradient_accumulation_steps,
        save_strategy="no",
        eval_strategy="epoch",
        logging_steps=logging_steps,
        learning_rate=learning_rate,
        max_grad_norm=max_grad_norm,
        weight_decay=0.1,
        warmup_ratio=warmup_ratio,
        lr_scheduler_type=lr_scheduler_type,
        report_to="tensorboard",
        bf16=True,
        use_mps_device=True,
        seed=123,
        hub_private_repo=False,
        push_to_hub=False,
        num_train_epochs=num_train_epochs,
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={"use_reentrant": False},
        #packing=True,
        max_length=max_seq_length,
        remove_unused_columns=False,
        dataset_text_field = "text",
        optim="adamw_torch",              
        adam_beta1=0.9,
        adam_beta2=0.95,                 
        adam_epsilon=1e-8,
        label_smoothing_factor=0.1, 
    )
    ##############################################################################################################

    trainer = SFTTrainer(
        model=model,
        args=training_arguments,
        train_dataset=dataset["train"],
        eval_dataset=dataset["test"],
        processing_class=tokenizer,
        peft_config=peft_config,
    )

    ##############################################################################################################
    torch.mps.empty_cache()

    trainer.train()
    trainer.save_model(output_dir="./trainer_output")

Regards,

Signed-off-by: debasisdwivedy <[email protected]>

ArthurZucker

Hey! Can you just summarize: this is to enable bf16 training on mps device right? is_torch_bf16_gpu_available() does not cover mps I suppose

debasisdwivedy · 2025-08-21T16:38:00Z

Yes correct. To make it work i have raised 2 PR's. Both have to be merged to make it work.

ACCELERATE_PACKAGE
TRANSFORMER_PACKAGE

I tested it with a small dataset on my MAC and it worked.

If you have higher memory mac available please test it before merging.

You can use the code provided and change the training parameters accordingly.

Regards,

ArthurZucker · 2025-08-22T13:11:11Z

can you fix the quality test please?!

Signed-off-by: debasisdwivedy <[email protected]>

setting MPS flag check for bf16 traning issue

e70ba27

Signed-off-by: debasisdwivedy <[email protected]>

ArthurZucker reviewed Aug 21, 2025

View reviewed changes

debasisdwivedy requested a review from ArthurZucker August 22, 2025 04:19

fixing formatting error

46fbe3c

Signed-off-by: debasisdwivedy <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setting MPS flag check for bf16 training issue #40216

Setting MPS flag check for bf16 training issue #40216

debasisdwivedy commented Aug 16, 2025 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

debasisdwivedy commented Aug 21, 2025 •

edited

Loading

Uh oh!

ArthurZucker commented Aug 22, 2025

Uh oh!

Uh oh!

Setting MPS flag check for bf16 training issue #40216

Are you sure you want to change the base?

Setting MPS flag check for bf16 training issue #40216

Conversation

debasisdwivedy commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

debasisdwivedy commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Aug 22, 2025

Uh oh!

Uh oh!

debasisdwivedy commented Aug 16, 2025 •

edited

Loading

debasisdwivedy commented Aug 21, 2025 •

edited

Loading