Releases: Lightning-AI/pytorch-lightning
Lightning v2.5.5
Changes in 2.5.5
PyTorch Lightning
Changed
Fixed
- Fixed
LightningCLInot usingckpt_pathhyperparameters to instantiate classes (#21116) - Fixed callbacks by defer step/time-triggered
ModelCheckpointsaves until validation metrics are available (#21106) - Fixed with adding a missing device id for pytorch 2.8 (#21105)
- Fixed
TQDMProgressBarnot resetting correctly when using both a finite and iterable dataloader (#21147) - Fixed cleanup of temporary files from
Tuneron crashes (#21162)
Lightning Fabric
Changed
Fixed
Full commit list: 2.5.4 -> 2.5.5
Contributors
We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make Lightning ⚡ better for everyone, nice job!
In particular, we would like to thank the authors of the pull-requests above, in no particular order:
@Borda, @KAVYANSHTYAGI, @littlebullGit, @mauvilsa, @SkafteNicki, @taozhiwei
Thank you ❤️ and we hope you'll keep them coming!
Lightning v2.5.4
Changes in 2.5.4
PyTorch Lightning
Fixed
- Fixed
AsyncCheckpointIOsnapshots tensors to avoid race with parameter mutation (#21079) - Fixed
AsyncCheckpointIOthreadpool exception if calling fit or validate more than one (#20952) - Fixed learning rate not being correctly set after using
LearningRateFindercallback (#21068) - Fixed misalignment column while using rich model summary in
DeepSpeedstrategy(#21100) - Fixed
RichProgressBarcrashing when sanity checking using val dataloader with 0 len (#21108)
Lightning Fabric
Changed
- Added support for NVIDIA H200 GPUs in
get_available_flops(#20913)
Full commit list: 2.5.3 -> 2.5.4
Contributors
We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make Lightning ⚡ better for everyone, nice job!
In particular, we would like to thank the authors of the pull-requests above, in no particular order:
@fnhirwa, @GdoongMathew, @jjh42, @littlebullGit, @SkafteNicki
Thank you ❤️ and we hope you'll keep them coming!
Lightning v2.5.3
Notable changes in this release
PyTorch Lightning
Changed
- Added
save_on_exceptionoption toModelCheckpointCallback (#20916) - Allow
dataloader_idx_in log names whenadd_dataloader_idx=False(#20987) - Allow returning
ONNXProgramwhen callingto_onnx(dynamo=True)(#20811) - Extended support for general mappings being returned from
training_stepwhen using manual optimization (#21011)
Fixed
- Fixed Allowing trainer to accept CUDAAccelerator instance as accelerator with FSDP strategy (#20964)
- Fixed progress bar console clearing for Rich
14.1+(#21016) - Fixed
AdvancedProfilerto handle nested profiling actions for Python 3.12+ (#20809) - Fixed
richprogress bar error when resume training (#21000) - Fixed double iteration bug when resumed from a checkpoint. (#20775)
- Fixed support for more dtypes in
ModelSummary(#21034) - Fixed metrics in
RichProgressBarbeing updated according to user providedrefresh_rate(#21032) - Fixed
save_lastbehavior in the absence of validation (#20960) - Fixed integration between
LearningRateFinderandEarlyStopping(#21056) - Fixed gradient calculation in
lr_finderformode="exponential"(#21055) - Fixed
save_hyperparameterscrashing withdataclassesusinginit=Falsefields (#21051)
Lightning Fabric
Changed
Fixed
Full commit list: 2.5.2 -> 2.5.3
Contributors
We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make Lightning ⚡ better for everyone, nice job!
In particular, we would like to thank the authors of the pull-requests above, in no particular order:
@baskrahmer, @bhimrazy, @deependujha, @fnhirwa, @GdoongMathew, @jonathanking, @relativityhd, @rittik9, @SkafteNicki, @sudiptob2, @vsey, @YgLK
Thank you ❤️ and we hope you'll keep them coming!
Lightning v2.5.2
Notable changes in this release
PyTorch Lightning
Changed
- Add
enable_autolog_hparamsargument to Trainer (#20593) - Add
toggled_optimizer(optimizer)method to the LightningModule, which is a context manager version oftoggle_optimizeanduntoggle_optimizer(#20771) - For cross-device local checkpoints, instruct users to install
fsspec>=2025.5.0if unavailable (#20780) - Check param is of
nn.Parametertype for pruning sanitization (#20783)
Fixed
- Fixed
save_hyperparametersnot working correctly withLightningCLIwhen there are parsing links applied on instantiation (#20777) - Fixed
logger_connectorhas an edge case where step can be a float (#20692) - Fixed Synchronize SIGTERM Handling in DDP to Prevent Deadlocks (#20825)
- Fixed case-sensitive model name (#20661)
- CLI: resolve jsonargparse deprecation warning (#20802)
- Fix: move
check_inputsto the target device if available duringto_torchscript(#20873) - Fixed progress bar display to correctly handle iterable dataset and
max_stepsduring training (#20869) - Fixed problem for silently supporting
jsonnet(#20899)
Lightning Fabric
Changed
- Ensure correct device is used for autocast when mps is selected as Fabric accelerator (#20876)
Removed
- Fix:
TransformerEnginePrecisionconversion for layers withbias=False(#20805)
Full commit list: 2.5.1 -> 2.5.2
Contributors
We thank all folks who submitted issues, features, fixes, and doc changes. It's the only way we can collectively make Lightning ⚡ better for everyone, nice job!
In particular, we would like to thank the authors of the pull-requests above, in no particular order:
@adamjstewart, @Armannas, @bandpooja, @Borda, @chanokin, @duydl, @GdoongMathew, @KAVYANSHTYAGI, @mauvilsa, @muthissar, @rustamzh, @siemdejong
Thank you ❤️ and we hope you'll keep them coming!
Lightning v2.5.1.post
Full Changelog: 2.5.1...2.5.1.post0
Lightning v2.5.1
Changes
PyTorch Lightning
Changed
- Allow LightningCLI to use a customized argument parser class (#20596)
- Change
wandbdefault x-axis totensorboard'sglobal_stepwhensync_tensorboard=True(#20611) - Added a new
checkpoint_path_prefixparameter to the MLflow logger which can control the path to where the MLflow artifacts for the model checkpoints are stored (#20538) - CometML logger was updated to support the recent Comet SDK (#20275)
- bump: testing with latest
torch2.6 (#20509)
Fixed
- Fixed CSVLogger logging hyperparameter at every write which increases latency (#20594)
- Fixed OverflowError when resuming from checkpoint with an iterable dataset (#20565)
- Fixed swapped
_R_coand_Pto prevent type error (#20508) - Always call
WandbLogger.experimentfirst in_call_setup_hookto ensuretensorboardlogs can sync towandb(#20610) - Fixed TBPTT example (#20528)
- Fixed test compatibility as AdamW became a subclass of Adam (#20574)
- Fixed file extension of model checkpoints uploaded by NeptuneLogger (#20581)
- Reset trainer variable
should_stopwhenfitis called (#19177) - Fixed making
WandbLoggerupload models from allModelCheckpointcallbacks, not just one (#20191) - Error when logging to MLFlow deleted experiment (#20556)
Lightning Fabric
Changed
Removed
- Removed legacy support for
lightning run model; usefabric runinstead. (#20588)
Full commit list: 2.5.0 -> 2.5.1
Contributors
We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make Lightning ⚡ better for everyone, nice job!
In particular, we would like to thank the authors of the pull-requests above, in no particular order:
@benglewis, @Borda, @cgebbe, @duydl, @haifeng-jin, @japdubengsub, @justusschock, @lantiga, @mauvilsa, @millskyle, @ringohoffman, @ryan597, @senarvi, @TresYap
Thank you ❤️ and we hope you'll keep them coming!
Lightning v2.5 post0
Full Changelog: 2.5.0...2.5.0.post0
Lightning v2.5
Lightning AI ⚡ is excited to announce the release of Lightning 2.5.
Lightning 2.5 comes with improvements on several fronts, with zero API changes. Our users love it stable, we keep it stable 😄.
Talking about love ❤️, the lightning, pytorch-lightning and lightning-fabric packages are collectively getting more than 10M downloads per month 😮, for a total of over 180M downloads 🤯 since the early days . It's incredible to see PyTorch Lightning getting such a strong adoption across the industry and the sciences.
Release 2.5 embraces PyTorch 2.5, and it marks some of its more recent directions as officially supported, namely tensor subclass-based APIs like Distributed Tensors and TorchAO, in combination with torch.compile.
Here's a couple of examples:
Distributed FP8 transformer with PyTorch Lightning
Full example here
import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
from lightning.pytorch.demos import Transformer, WikiText2
from lightning.pytorch.strategies import ModelParallelStrategy
from torch.distributed._composable.fsdp.fully_shard import fully_shard
from torch.utils.data import DataLoader
from torchao.float8 import Float8LinearConfig, convert_to_float8_training
class LanguageModel(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.vocab_size = vocab_size
self.model = None
def configure_model(self):
if self.model is not None:
return
with torch.device("meta"):
model = Transformer(
vocab_size=self.vocab_size,
nlayers=16,
nhid=4096,
ninp=1024,
nhead=32,
)
float8_config = Float8LinearConfig(
# pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly # noqa
pad_inner_dim=True,
)
def module_filter_fn(mod: torch.nn.Module, fqn: str):
# we skip the decoder because it typically vocabulary size
# is not divisible by 16 as required by float8
return fqn != "decoder"
convert_to_float8_training(model, config=float8_config, module_filter_fn=module_filter_fn)
for module in model.modules():
if isinstance(module, (nn.TransformerEncoderLayer, nn.TransformerDecoderLayer)):
fully_shard(module, mesh=self.device_mesh)
fully_shard(model, mesh=self.device_mesh)
self.model = torch.compile(model)
def training_step(self, batch):
input, target = batch
output = self.model(input, target)
loss = F.nll_loss(output, target.view(-1))
self.log("train_loss", loss, prog_bar=True)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-4)
def train():
L.seed_everything(42)
dataset = WikiText2()
train_dataloader = DataLoader(dataset, num_workers=8, batch_size=1)
model = LanguageModel(vocab_size=dataset.vocab_size)
mp_strategy = ModelParallelStrategy(
data_parallel_size=4,
tensor_parallel_size=1,
)
trainer = L.Trainer(strategy=mp_strategy, max_steps=100, precision="bf16-true", accumulate_grad_batches=8)
trainer.fit(model, train_dataloader)
trainer.print(torch.cuda.memory_summary())
if __name__ == "__main__":
torch.set_float32_matmul_precision("high")
train()Distributed FP8 transformer with Fabric
Full example here
import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
from lightning.fabric.strategies import ModelParallelStrategy
from lightning.pytorch.demos import Transformer, WikiText2
from torch.distributed._composable.fsdp.fully_shard import fully_shard
from torch.distributed.device_mesh import DeviceMesh
from torch.utils.data import DataLoader
from torchao.float8 import Float8LinearConfig, convert_to_float8_training
from tqdm import tqdm
def configure_model(model: nn.Module, device_mesh: DeviceMesh) -> nn.Module:
float8_config = Float8LinearConfig(
# pip install -U --index-url <https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/> triton-nightly # noqa
pad_inner_dim=True,
)
def module_filter_fn(mod: torch.nn.Module, fqn: str):
# we skip the decoder because it typically vocabulary size
# is not divisible by 16 as required by float8
return fqn != "decoder"
convert_to_float8_training(model, config=float8_config, module_filter_fn=module_filter_fn)
for module in model.modules():
if isinstance(module, (torch.nn.TransformerEncoderLayer, torch.nn.TransformerDecoderLayer)):
fully_shard(module, mesh=device_mesh)
fully_shard(model, mesh=device_mesh)
return torch.compile(model)
def train():
L.seed_everything(42)
batch_size = 8
micro_batch_size = 1
max_steps = 100
dataset = WikiText2()
dataloader = DataLoader(dataset, num_workers=8, batch_size=micro_batch_size)
with torch.device("meta"):
model = Transformer(
vocab_size=dataset.vocab_size,
nlayers=16,
nhid=4096,
ninp=1024,
nhead=32,
)
strategy = ModelParallelStrategy(data_parallel_size=4, tensor_parallel_size=1, parallelize_fn=configure_model)
fabric = L.Fabric(precision="bf16-true", strategy=strategy)
fabric.launch()
model = fabric.setup(model)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
optimizer = fabric.setup_optimizers(optimizer)
dataloader = fabric.setup_dataloaders(dataloader)
iterable = tqdm(enumerate(dataloader), total=len(dataloader)) if fabric.is_global_zero else enumerate(dataloader)
steps = 0
for i, batch in iterable:
input, target = batch
is_accumulating = i % (batch_size // micro_batch_size) != 0
with fabric.no_backward_sync(model, enabled=is_accumulating):
output = model(input, target)
loss = F.nll_loss(output, target.view(-1))
fabric.backward(loss)
if not is_accumulating:
fabric.clip_gradients(model, optimizer, max_norm=1.0)
optimizer.step()
optimizer.zero_grad()
steps += 1
if fabric.is_global_zero:
iterable.set_postfix_str(f"train_loss={loss.item():.2f}")
if steps == max_steps:
break
fabric.print(torch.cuda.memory_summary())
if __name__ == "__main__":
torch.set_float32_matmul_precision("high")
train()As these examples show, it's now easier than ever to take your PyTorch Lightning module and run it with FSDP2 and/or tensor parallelism in FP8 precision, using the ModelParallelStrategy we introduced in 2.4.
Also note the use of distributed tensor APIs, TorchAO APIs, and torch.compile directly in the configure_model hook (or in the parallelize function in Fabric's ModelParallelStrategy), as opposed to the LightningModule as a whole. The advantage with this approach is that you can just copy-paste the parallelize functions that come with native PyTorch models directly in configure_model and get the same effect, no head-scratching involved 🤓.
Talking about head scratching, we also made a pass at the PyTorch Lightning internals and hardened the parts where we keep track of progress counters during training, validation, testing, as well as learning rate scheduling, in relation to resuming from checkpoints. We now made sure there are no (to the best of our knowledge) edge cases where stopping and resuming from checkpoints can change the sequence of loops or other internal states. Fault tolerance for the win 🥳!
Alright! Feel free to take a look at the full changelog below.
And of course: the best way to use PyTorch Lightning and Fabric is through Lightning Studio ⚡. Access GPUs, train models, deploy and more with zero setup. Focus on data and models - not infrastructure.
Changes
PyTorch Lightning
Added
- Added
stepparameter toTensorBoardLogger.log_hyperparamsto visualize changes during training (#20176) - Added
strmethod to datamodule (#20301) - Added timeout to DeepSpeedStrategy (#20474)
- Added doc for Truncated Back-Propagation Through Time (#20422)
- Added FP8 + FSDP2 + torch.compile examples for PyTorch Lightning (#20440)
- Added profiling to
Trainer.save_checkpoint(#20405) - Added after_instantiate_classes hook to CLI (#20401)
<details...
Lightning 2.5 RC
2.5.0rc0 Bump to 2.5.0rc0 (#20493)
Lightning v2.4
Lightning AI ⚡ is excited to announce the release of Lightning 2.4. This is mainly a compatibility upgrade for PyTorch 2.4 and Python 3.12, with a sprinkle of a few features and bug fixes.
Did you know? The Lightning philosophy extends beyond a boilerplate-free deep learning framework: We've been hard at work bringing you Lightning Studio. Code together, prototype, train, deploy, host AI web apps. All from your browser, with zero setup.
Changes
PyTorch Lightning
Added
- Made saving non-distributed checkpoints fully atomic (#20011)
- Added
dump_statsflag toAdvancedProfiler(#19703) - Added a flag
verboseto theseed_everything()function (#20108) - Added support for PyTorch 2.4 (#20010)
- Added support for Python 3.12 (20078)
- The
TQDMProgressBarnow provides an option to retain prior training epoch bars (#19578) - Added the count of modules in train and eval mode to the printed
ModelSummarytable (#20159)
Changed
- Triggering KeyboardInterrupt (Ctrl+C) during
.fit(),.evaluate(),.test()or.predict()now terminates all processes launched by the Trainer and exits the program (#19976) - Changed the implementation of how seeds are chosen for dataloader workers when using
seed_everything(..., workers=True)(#20055) - NumPy is no longer a required dependency (#20090)
Fixed
- Avoid LightningCLI saving hyperparameters with
class_pathandinit_argssince this would be a breaking change (#20068) - Fixed an issue that would cause too many printouts of the seed info when using
seed_everything()(#20108) - Fixed
_LoggerConnector's_ResultMetricto move all registered keys to the device of the logged value if needed (#19814) - Fixed
_optimizer_to_devicelogic for special 'step' key in optimizer state causing performance regression (#20019) - Fixed parameter counts in
ModelSummarywhen model has distributed parameters (DTensor) (#20163)
Lightning Fabric
Added
Changed
Fixed
Full commit list: 2.3.0 -> 2.4.0
Contributors
We thank all our contributors who submitted pull requests for features, bug fixes and documentation updates.
New Contributors
- @SamuelLarkin made their first contribution in #19969
- @liambsmith made their first contribution in #19986
- @EtayLivne made their first contribution in #19915
- @elmuz made their first contribution in #19998
- @swyo made their first contribution in #19982
- @corwinjoy made their first contribution in #20011
- @omahs made their first contribution in #19979
- @linbo0518 made their first contribution in #20040
- @01AbhiSingh made their first contribution in #20055
- @K-H-Ismail made their first contribution in #20099
- @adosar made their first contribution in #20146
- @jojje made their first contribution in #19578
Did you know?
Chuck Norris can solve NP-hard problems in polynomial time. In fact, any problem is easy when Chuck Norris solves it.