Skip to content

Conversation

@matsumotosan
Copy link
Contributor

@matsumotosan matsumotosan commented Aug 14, 2025

What does this PR do?

Fixes #20450 #20058 #20643

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--21072.org.readthedocs.build/en/21072/

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Aug 14, 2025
@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 84.21053% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87%. Comparing base (74b3fd5) to head (bfd8656).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #21072   +/-   ##
=======================================
  Coverage      87%      87%           
=======================================
  Files         269      269           
  Lines       23732    23744   +12     
=======================================
+ Hits        20557    20569   +12     
  Misses       3175     3175           

@matsumotosan matsumotosan marked this pull request as draft August 15, 2025 18:21
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 15, 2025
@matsumotosan matsumotosan force-pushed the weights-only-compatibility branch from d7cb702 to 601e300 Compare August 15, 2025 22:20
@matsumotosan matsumotosan marked this pull request as ready for review August 16, 2025 15:37
@matsumotosan matsumotosan changed the title Compatibility for weights_only=True by default Compatibility for weights_only=True by default for loading weights Aug 16, 2025
@matsumotosan
Copy link
Contributor Author

@Borda I wanted to get your opinion on something before moving forward.

I've added weights_only as an argument to LightningModule.load_from_checkpoint and all downstream functions to allow users to determine which option they want to use to load checkpoints.

My issue right now is with resuming training from a checkpoint with Trainer.fit. I see a few options right now:

  1. Add weights_only as an argument to Trainer.fit (would also have to modify args for validate, test, and predict). Set default value to True.
  2. Use weights_only=True everywhere, and print an error message advising user to set TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD if they would like to load with weights_only=False. Users must explicitly set environment variable to force loading with weights_only=False.
  3. Add weights_only as an argument to Trainer initialization. Easy, but would not allow fine-grained control on loading models between different calls of fit, validate, etc.

I'm leaning towards option 1, but it involves changing up Trainer methods, which affects a lot of code so wanted to run this by you beforehand.

@Borda
Copy link
Collaborator

Borda commented Aug 18, 2025

My issue right now is with resuming training from a checkpoint with Trainer.fit. I see a few options right now:

  1. Add weights_only as an argument to Trainer.fit (would also have to modify args for validate, test, and predict). Set default value to True.
  2. Use weights_only=True everywhere, and print an error message advising user to set TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD if they would like to load with weights_only=False. Users must explicitly set environment variable to force loading with weights_only=False.
  3. Add weights_only as an argument to Trainer initialization. Easy, but would not allow fine-grained control on loading models between different calls of fit, validate, etc.

The cleanest way would probably be 1), but it brings so many new arguments for a marginal use... so personally I would go with 2)
cc: @lantiga

@deependujha
Copy link
Collaborator

Seems like an actual issue than a flaky test behavior

@deependujha deependujha merged commit 29abe6e into Lightning-AI:master Nov 7, 2025
112 checks passed
@matsumotosan matsumotosan deleted the weights-only-compatibility branch November 7, 2025 13:34
nathanpainchaud added a commit to nathanpainchaud/lightning-hydra-template that referenced this pull request Nov 28, 2025
nathanpainchaud added a commit to nathanpainchaud/lightning-hydra-template that referenced this pull request Nov 28, 2025
Set `weights_only=False` when loading ckpts, since Lightning now defers to torch's default (`True`)
* See PR on this change: Lightning-AI/pytorch-lightning#21072

---------

Co-authored-by: Nathan Painchaud <[email protected]>
cathalobrien added a commit to ecmwf/anemoi-core that referenced this pull request Nov 28, 2025
## Description
[this](Lightning-AI/pytorch-lightning#21072)
change in ptl 2.6.0 means we have to explicitly specify
"weight_only=False" when calling `BaseGraphModule.load_from_checkpoint`
(nice spot Ana!)

***As a contributor to the Anemoi framework, please ensure that your
changes include unit tests, updates to any affected dependencies and
documentation, and have been tested in a parallel setting (i.e., with
multiple GPUs). As a reviewer, you are also responsible for verifying
these aspects and requesting changes if they are not adequately
addressed. For guidelines about those please refer to
https://anemoi.readthedocs.io/en/latest/***

By opening this pull request, I affirm that all authors agree to the
[Contributor License
Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous Integration dependencies Pull requests that update a dependency file dockers fabric lightning.fabric.Fabric package pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure the upcoming change in the default for weights_only from False to True is handled correctly

7 participants