Skip to content

N-Masi/safe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAFE: Stratified Assessments of Forecasts over Earth

arXiv PyPI - Version GitHub Static Badge Website

Installation

pip install safe-earth

To build from source instead:

# get repo
git clone [email protected]:N-Masi/safe.git

# create dev environment
conda create -n safe.env
conda activate safe.env
pip install --file requirements.txt
conda install --channel conda-forge pygmt plotly typing_extensions

Basic Usage

There are 3 basic steps to any SAFE pipeline:

  1. Measure loss: any function that operates between each predicted $\hat{y}$ and the ground truth $y$. There is a loss calculated for every prediction by a given model at every permutation of gridpoint, timestamp, lead time, variable, and vertical level.

    Example: the latitude-weighted squared difference of $\hat{y}$ and $y$.

  2. Measure stratified error: any function that reduces across gridpoints to calculate a metric for each strata.

    Example: RMSE.

  3. Measure fairness: any function that operates on a set of stratified errors. Calculates a fairness metric for each permutation of model and attribute (e.g., the fairness of GraphCast in prediction by territory).

    Example: greatest absolute difference in RMSEs.

It is most useful to look at the errors and fairness. Errors allow you to see how well a particular model works in a specific strata, which can be useful to decision makers determining which model is most accurate for their country or region. Fairness metrics provide a summary statistic for the overall amount of bias in a model.

For now, loss functions should create dataframes with columns for the output of the function. The name of that column is passed into the error function. Calls to src/safe_earth/metrics/fairness.measure_fairness take in the fairness functions as objects and run them all internally. The first major version of the package will bring this paradigm to the errors as well by taking in loss functions as parameters.

Demos

An example of using SAFE to collect metrics on 6 AIWP models across the territory, subregion, income, and landcover attributes is availabe in demos/iclr_workflow.py. It generates error and fairness data by assessing the models on 2020 ERA5 data.

To see the type of analysis that can be performed with this data, you can reproduce the figures and tables from the paper by running demos/iclr_figs.py and demos/iclr_tables.py, respectively.

An interactive notebook utilizing SAFE to investigate territorial disparities is available in demos/interactive_demo.ipynb.

Data Notes

To unify the coordinate system across all integrated data sources, latitude ranges [-90, 90] with index 0 at -90, and longitude [-180, 180) but with index 0 at 0 and a wraparound from 180 to -180 in the middle. This is because metadata sourced from pygeoboundaries_geolab follows this coordinate system, and it is easiest to bring tabular data into conformance.

Testing

Run pytest in the terminal of the repo directory while in a python environment that has pytest installed.

Citation

If you use SAFE in your work, please cite us!

@article{masi2025safe,
  title={SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth},
  author={Masi, Nick and Balestriero, Randall},
  journal={arXiv preprint arXiv:2510.26099},
  year={2025}
}

About

Stratified Assessments of Fairness over Earth

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages