Skip to content

Conversation

@MatKbauer
Copy link
Contributor

Description

We assemble the different PRs of the latent diffusion forecast engine together to test their compatibility and perform first explorations.

Issue Number

Closes #1300

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

sophie-xhonneux and others added 30 commits October 30, 2025 17:27
Implemented Identity class

TODO: implement EMATeacher
The big question on the EMA teacher side to me is how to allow for a
fleixble teacher and student architecture that can differ

We updated some APIs of the abstract base class to allow the ema_model
forward, subject to change given the loss calculator, which is imho the
second big question mark
Easier to read and as batchsize gets more complicated in SSL this will
be a useful abstraction
It runs so far. Next steps:
 - Route all the config options
 - Start writing the loss functions to understand the state requirements
…andom and healpix masking. Open issues with _coords_local, centroids and probably other things.
TODO:
- Forecast still needs to be adapted
- Some more cleanup of variable naming, return values etc
clessig and others added 30 commits December 3, 2025 00:20
* Add to device to ModelBatch, etc & adapt model

TODO adapt validate and inference
TODO test forecasting and multiple stream because predict changed
substantially

* Rename view to sample and fix validate

* Revert predict function and fix inference

* Fix invalid access with mask

* Linting

* Fixed handling of target_idxs and other minor issues

---------

Co-authored-by: sophiex <[email protected]>
Co-authored-by: Christian Lessig <[email protected]>
… up fixes required to handle partially filler source/target streams (because source has no target values, eg).
…tion (#1397)

* initial changes

* more changes

* removed extra print parameters statement

* changed names for backward checkpoint loading

* added encoder. to module names in sharding

* adding encoder. to embed_engine

* added back the conditions for param printong

* lint

* forecast config

* switch back to MTM config

* lint
…ough this.

This commit also changes the number of forecast steps that are taken. The old loop was at least one step too far. Unclear why the problem occurred now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model:rollout model Related to model training or definition (not generic infra)

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Assemble latent diffusion forecast engine

9 participants