| CI | |
|---|---|
| Docs | |
| Package | |
| License |
Data repository for with sample data for the Pythia Foundations book.
These files are used as sample data in Pythia Foundations and are downloaded by pythia_datasets package:
NARR_19930313_0000.ncenso_data.csvjan-17-co-asos.txt.xzCESM2_sst_data.ncCESM2_grid_variables.ncdaymet_v4_precip_sept_2013.nc
The scope of this data collection is to serve examples for Pythia Foundations. If you are adding new content to Foundations that requires a new dataset file, please follow these steps:
- Add the dataset file to the
data/directory - From the command line, run
python make_registry.pyscript to update the registry file residing inpythia_datasets/registry.txt - Commit and push your changes to GitHub
-
Ensure the
pythia_datasetspackage is installed in your environmentpython -m pip install pythia-datasets # or python -m pip install git+https://github.com/ProjectPythia/pythia-datasets -
Import
DATASETSand inspect the registry to find out which datasets are availableIn [1]: from pythia_datasets import DATASETS In [2]: DATASETS.registry_files Out[2]: ['jan-17-co-asos.txt.xz', 'NARR_19930313_0000.nc']
-
To fetch a data file of interest, use the
.fetchmethod and provide the filename of the data file. This will- download and cache the file if it doesn't exist already.
- retrieve and return the local path
In [4]: filepath = DATASETS.fetch('jan-17-co-asos.txt.xz') In [5]: filepath Out[5]: '/Users/abanihi/Library/Caches/pythia-datasets/jan-17-co-asos.txt.xz'
-
Once you have access to the local filepath, you can then use it to load your dataset into pandas or xarray or your package of choice:
In [6]: df = pd.read_csv(filepath)
The default cache location (where the data are saved on your local system) is dependent on the operating system. You can use the locate() method to identify it:
from pythia_datasets import locate
locate()The location can be overwritten by the PYTHIA_DATASETS_DIR environment
variable to the desired destination.