Skip to content

Add pygmt.gmtread to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d913c86
Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xar…
seisman Dec 2, 2024
f456bf8
Set GMT accessor
seisman Dec 5, 2024
c3cbb6e
Need to set 'source' encoding to make GMT accessor work
seisman Dec 5, 2024
f2a4ce4
Merge branch 'main' into feature/read
seisman Dec 5, 2024
1dd97c6
Fix the source encoding
seisman Dec 5, 2024
7790ea3
No need to set the source encoding in load_remote_dataset.py
seisman Dec 5, 2024
e588008
Revert changes in pygmt/datasets/load_remote_dataset.py
seisman Dec 6, 2024
40d12ee
Improve docstring in pygmt/helpers/testing.py
seisman Dec 6, 2024
fa1021d
Improve docstrinbgs
seisman Dec 6, 2024
c378225
Get rid of decorators
seisman Dec 8, 2024
7b749e0
Improve comment
seisman Dec 8, 2024
8befa58
Get rid of the fmt_docstring alias
seisman Dec 8, 2024
a758752
Fix type hints issue with overload
seisman Dec 9, 2024
9d66cf4
Remove the type ignore flag
seisman Dec 9, 2024
a05383a
region defaults to None
seisman Dec 9, 2024
6ca4ef2
Merge branch 'main' into feature/read
seisman Dec 9, 2024
7851ced
Improve type hints and add tests
seisman Dec 9, 2024
084b87a
Improve the checking of return value of which
seisman Dec 9, 2024
b21997c
Use the read funciton in pygmt/tests/test_datatypes_dataset.py
seisman Dec 9, 2024
a812317
Use the read function instead of the load_dataarray method
seisman Dec 9, 2024
1f0f158
Add one test to make sure that read and load_dataarray returns the sa…
seisman Dec 9, 2024
957c7eb
Simplify pygmt/tests/test_clib_read_data.py with read
seisman Dec 9, 2024
6aef3ca
Fix a typo
seisman Dec 9, 2024
72afbfe
Replace xr.open_dataarray with read
seisman Dec 9, 2024
03de9b7
Fix a typo
seisman Dec 9, 2024
85c533d
Merge branch 'main' into feature/read
seisman Dec 19, 2024
663c76d
Merge branch 'main' into feature/read
seisman Mar 12, 2025
3ed1032
Fix styling
seisman Mar 12, 2025
7d320f4
Merge branch 'main' into feature/read
seisman Mar 12, 2025
2e72ebe
Merge branch 'main' into feature/read
seisman Apr 16, 2025
6d634cc
Minor fix
seisman Apr 16, 2025
4dc7974
Add parameter name
seisman Apr 16, 2025
69f5c45
Rename read to gmtread
seisman Apr 17, 2025
061f5f2
Restructure io.py into a directory
seisman Apr 17, 2025
4f0779e
Move gmtread from src to io
seisman Apr 17, 2025
a06ddca
Fixes, and clean up
seisman Apr 17, 2025
82b80f5
Fix a doctest
seisman Apr 17, 2025
a6c4ee7
Add tests for reading images
seisman Apr 17, 2025
b4a0b9d
Merge branch 'main' into feature/read
seisman May 2, 2025
37fc1de
Revert pygmt/tests/test_clib_put_matrix.py
seisman May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ Input/output
:toctree: generated

load_dataarray
read
Comment on lines 174 to +175
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load_dataarray function was put under the pygmt.io namespace. Should we consider putting read under pygmt.io too? (Thinking about whether we need a low-level pygmt.clib.read and high-level pygmt.io.read in my other comment).

Copy link
Member Author

@seisman seisman Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good. I have two questions:

  1. Should we place the read source code in pygmt/io.py, or restructure io.py into a directory and put it in pygmt/io/read.py instead?
  2. Should we deprecate the load_dataarray function in favor of the new read function?

I'm expecting to have a write function that writes a pandas.DataFrame/xarray.DataArray into a tabular/netCDF file

GMT.jl also wraps the read module (xref: https://www.generic-mapping-tools.org/GMTjl_doc/documentation/utilities/gmtread/). The differences are:

  1. It uses name gmtread, which I think is better since read is a little to general.
  2. It returns custom data types like GMTVector, GMTGrid. [This doesn't work in PyGMT]
  3. It guesses the data kind based on the extensions. [Perhaps we can also do a similar guess?]

Copy link
Member

@weiji14 weiji14 Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we place the read source code in pygmt/io.py, or restructure io.py into a directory and put it in pygmt/io/read.py instead?

I think making the io directory sounds good, especially if you're planning on making a write function in the future.

Should we deprecate the load_dataarray function in favor of the new read function?

No, let's keep load_dataarray for now. Something I'm contemplating is to make an xarray BackendEntrypoint that uses GMT read, so that users can then do pygmt.io.load_dataarray(..., engine="gmtread") or something like that. The load_dataarray function would use this new gmtread backend engine by default instead of netcdf4.


GMT Defaults
------------
Expand Down
1 change: 1 addition & 0 deletions pygmt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
makecpt,
nearneighbor,
project,
read,
select,
sph2grd,
sphdistance,
Expand Down
6 changes: 2 additions & 4 deletions pygmt/datasets/samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@
import pandas as pd
import xarray as xr
from pygmt.exceptions import GMTInvalidInput
from pygmt.io import load_dataarray
from pygmt.src import which
from pygmt.src import read, which


def _load_japan_quakes() -> pd.DataFrame:
Expand Down Expand Up @@ -203,8 +202,7 @@ def _load_earth_relief_holes() -> xr.DataArray:
The Earth relief grid. Coordinates are latitude and longitude in degrees. Relief
is in meters.
"""
fname = which("@earth_relief_20m_holes.grd", download="c")
return load_dataarray(fname, engine="netcdf4")
return read("@earth_relief_20m_holes.grd", kind="grid") # type: ignore[return-value]


class GMTSampleData(NamedTuple):
Expand Down
13 changes: 6 additions & 7 deletions pygmt/helpers/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
import string
from pathlib import Path

import xarray as xr
from pygmt.exceptions import GMTImageComparisonFailure
from pygmt.io import load_dataarray
from pygmt.src import which
from pygmt.src import read


def check_figures_equal(*, extensions=("png",), tol=0.0, result_dir="result_images"):
Expand Down Expand Up @@ -144,17 +144,16 @@ def wrapper(*args, ext="png", request=None, **kwargs):
return decorator


def load_static_earth_relief():
def load_static_earth_relief() -> xr.DataArray:
"""
Load the static_earth_relief file for internal testing.
Load the static_earth_relief.nc file for internal testing.

Returns
-------
data : xarray.DataArray
data
A grid of Earth relief for internal tests.
"""
fname = which("@static_earth_relief.nc", download="c")
return load_dataarray(fname)
return read("@static_earth_relief.nc", kind="grid") # type: ignore[return-value]


def skip_if_no(package):
Expand Down
1 change: 1 addition & 0 deletions pygmt/src/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
from pygmt.src.plot3d import plot3d
from pygmt.src.project import project
from pygmt.src.psconvert import psconvert
from pygmt.src.read import read
from pygmt.src.rose import rose
from pygmt.src.select import select
from pygmt.src.shift_origin import shift_origin
Expand Down
118 changes: 118 additions & 0 deletions pygmt/src/read.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""
Read a file into an appropriate object.
"""

from collections.abc import Mapping, Sequence
from pathlib import PurePath
from typing import Any, Literal

import pandas as pd
import xarray as xr
from pygmt.clib import Session
from pygmt.helpers import build_arg_list, is_nonstr_iter
from pygmt.src.which import which


def read(
file: str | PurePath,
kind: Literal["dataset", "grid", "image"],
region: Sequence[float] | str | None = None,
header: int | None = None,
column_names: pd.Index | None = None,
dtype: type | Mapping[Any, type] | None = None,
index_col: str | int | None = None,
) -> pd.DataFrame | xr.DataArray:
"""
Read a dataset, grid, or image from a file and return the appropriate object.

The returned object is a :class:`pandas.DataFrame` for datasets, and
:class:`xarray.DataArray` for grids and images.

For datasets, keyword arguments ``column_names``, ``header``, ``dtype``, and
``index_col`` are supported.

Parameters
----------
file
The file name to read.
kind
The kind of data to read. Valid values are ``"dataset"``, ``"grid"``, and
``"image"``.
region
The region of interest. Only data within this region will be read.
column_names
A list of column names.
header
Row number containing column names. ``header=None`` means not to parse the
column names from table header. Ignored if the row number is larger than the
number of headers in the table.
dtype
Data type. Can be a single type for all columns or a dictionary mapping
column names to types.
index_col
Column to set as index.

Returns
-------
data
Return type depends on the ``kind`` argument:

- ``"dataset"``: :class:`pandas.DataFrame`
- ``"grid"`` or ``"image"``: :class:`xarray.DataArray`


Examples
--------
Read a dataset into a :class:`pandas.DataFrame` object:

>>> from pygmt import read
>>> df = read("@hotspots.txt", kind="dataset")
>>> type(df)
<class 'pandas.core.frame.DataFrame'>

Read a grid into an :class:`xarray.DataArray` object:

>>> dataarray = read("@earth_relief_01d", kind="grid")
>>> type(dataarray)
<class 'xarray.core.dataarray.DataArray'>
"""
if kind not in {"dataset", "grid", "image"}:
msg = f"Invalid kind {kind}: must be one of 'dataset', 'grid', or 'image'."
raise ValueError(msg)

if kind != "dataset" and any(
v is not None for v in [column_names, header, dtype, index_col]
):
msg = (
"Only the 'dataset' kind supports the 'column_names', 'header', "
"'dtype', and 'index_col' arguments."
)
raise ValueError(msg)

kwdict = {
"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region, # type: ignore[union-attr]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is used here to avoid using the kwargs_to_string, use_alias decorators:

"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region

"T": {"dataset": "d", "grid": "g", "image": "i"}[kind],
}

with Session() as lib:
with lib.virtualfile_out(kind=kind) as voutfile:
lib.call_module("read", args=[file, voutfile, *build_arg_list(kwdict)])

match kind:
case "dataset":
return lib.virtualfile_to_dataset(
vfname=voutfile,
column_names=column_names,
header=header,
dtype=dtype,
index_col=index_col,
)
case "grid" | "image":
raster = lib.virtualfile_to_raster(vfname=voutfile, kind=kind)
Copy link
Member

@weiji14 weiji14 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debating on whether we should have a low-level clib read that reads into a GMT virtualfile, and a high-level read that wraps around that to do both read + convert virtualfile to a pandas.DataFrame or xarray.DataArray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a low-level Session.read method, but as far as I can see, currently it will be used only in the high-level read function, so it seems unecessary. We can refactor and add the low-level Session.read method when we need it in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ok with just making a high-level read method for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we already have the Session.read_data method which is almost the same as the low-level read method you proposed.

pygmt/pygmt/clib/session.py

Lines 1112 to 1123 in 99a6340

def read_data(
self,
infile: str,
kind: Literal["dataset", "grid", "image"],
family: str | None = None,
geometry: str | None = None,
mode: str = "GMT_READ_NORMAL",
region: Sequence[float] | None = None,
data=None,
):
"""
Read a data file into a GMT data container.

The fun fact is that gmtread calls the gmt_copy function, which is a wrapper of the GMT_Read_Data function (https://github.com/GenericMappingTools/gmt/blob/9a8769f905c2b55cf62ed57cd0c21e40c00b3560/src/gmt_api.c#L1294)

# Add "source" encoding
source = which(fname=file)
raster.encoding["source"] = (
source[0] if isinstance(source, list) else source
)
_ = raster.gmt # Load GMTDataArray accessor information
return raster
23 changes: 11 additions & 12 deletions pygmt/tests/test_clib_put_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
import numpy as np
import numpy.testing as npt
import pytest
import xarray as xr
from pygmt import clib
from pygmt import clib, read
from pygmt.clib.session import DTYPES_NUMERIC
from pygmt.exceptions import GMTCLibError
from pygmt.helpers import GMTTempFile
Expand Down Expand Up @@ -101,7 +100,7 @@ def test_put_matrix_grid(dtypes):
newdata = tmp_file.loadtxt(dtype=dtype)
npt.assert_allclose(newdata, data)

# Save the data to a netCDF grid and check that xarray can load it
# Save the data to a netCDF grid and check it can be read again.
with GMTTempFile(suffix=".nc") as tmp_grid:
lib.write_data(
"GMT_IS_MATRIX",
Expand All @@ -111,12 +110,12 @@ def test_put_matrix_grid(dtypes):
tmp_grid.name,
grid,
)
with xr.open_dataarray(tmp_grid.name) as dataarray:
assert dataarray.shape == shape
npt.assert_allclose(dataarray.data, np.flipud(data))
npt.assert_allclose(
dataarray.coords["x"].actual_range, np.array(wesn[0:2])
)
npt.assert_allclose(
dataarray.coords["y"].actual_range, np.array(wesn[2:4])
)
dataarray = read(tmp_grid.name, kind="grid")
assert dataarray.shape == shape
npt.assert_allclose(dataarray.data, np.flipud(data))
npt.assert_allclose(
dataarray.coords["x"].actual_range, np.array(wesn[0:2])
)
npt.assert_allclose(
dataarray.coords["y"].actual_range, np.array(wesn[2:4])
)
4 changes: 2 additions & 2 deletions pygmt/tests/test_clib_read_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from pygmt.clib import Session
from pygmt.exceptions import GMTCLibError
from pygmt.helpers import GMTTempFile
from pygmt.io import load_dataarray
from pygmt.helpers.testing import load_static_earth_relief
from pygmt.src import which

try:
Expand All @@ -27,7 +27,7 @@ def fixture_expected_xrgrid():
"""
The expected xr.DataArray object for the static_earth_relief.nc file.
"""
return load_dataarray(which("@static_earth_relief.nc"))
return load_static_earth_relief()


@pytest.fixture(scope="module", name="expected_xrimage")
Expand Down
9 changes: 2 additions & 7 deletions pygmt/tests/test_datatypes_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@

import pandas as pd
import pytest
from pygmt import which
from pygmt.clib import Session
from pygmt import read, which
from pygmt.helpers import GMTTempFile


Expand Down Expand Up @@ -44,11 +43,7 @@ def dataframe_from_gmt(fname, **kwargs):
"""
Read tabular data as pandas.DataFrame using GMT virtual file.
"""
with Session() as lib:
with lib.virtualfile_out(kind="dataset") as vouttbl:
lib.call_module("read", [fname, vouttbl, "-Td"])
df = lib.virtualfile_to_dataset(vfname=vouttbl, **kwargs)
return df
return read(fname, kind="dataset", **kwargs)


@pytest.mark.benchmark
Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_dimfilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pytest
import xarray as xr
from pygmt import dimfilter, load_dataarray
from pygmt import dimfilter, read
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief
Expand Down Expand Up @@ -56,7 +56,7 @@ def test_dimfilter_outgrid(grid, expected_grid):
)
assert result is None # return value is None
assert Path(tmpfile.name).stat().st_size > 0 # check that outgrid exists
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
xr.testing.assert_allclose(a=temp_grid, b=expected_grid)


Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_grdclip.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pytest
import xarray as xr
from pygmt import grdclip, load_dataarray
from pygmt import grdclip, read
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief

Expand Down Expand Up @@ -49,7 +49,7 @@ def test_grdclip_outgrid(grid, expected_grid):
)
assert result is None # return value is None
assert Path(tmpfile.name).stat().st_size > 0 # check that outgrid exists
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
assert temp_grid.dims == ("lat", "lon")
assert temp_grid.gmt.gtype == 1 # Geographic grid
assert temp_grid.gmt.registration == 1 # Pixel registration
Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_grdcut.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import numpy as np
import pytest
import xarray as xr
from pygmt import grdcut, load_dataarray
from pygmt import grdcut, read
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief
Expand Down Expand Up @@ -50,7 +50,7 @@ def test_grdcut_dataarray_in_file_out(grid, expected_grid, region):
with GMTTempFile(suffix=".nc") as tmpfile:
result = grdcut(grid, outgrid=tmpfile.name, region=region)
assert result is None # grdcut returns None if output to a file
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
xr.testing.assert_allclose(a=temp_grid, b=expected_grid)


Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_grdfill.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
import pytest
import xarray as xr
from pygmt import grdfill, load_dataarray
from pygmt import grdfill, read
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief
Expand Down Expand Up @@ -109,7 +109,7 @@ def test_grdfill_file_out(grid, expected_grid):
result = grdfill(grid=grid, mode="c20", outgrid=tmpfile.name)
assert result is None # return value is None
assert Path(tmpfile.name).stat().st_size > 0 # check that outfile exists
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
xr.testing.assert_allclose(a=temp_grid, b=expected_grid)


Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_grdfilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np
import pytest
import xarray as xr
from pygmt import grdfilter, load_dataarray
from pygmt import grdfilter, read
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief
Expand Down Expand Up @@ -70,7 +70,7 @@ def test_grdfilter_dataarray_in_file_out(grid, expected_grid):
)
assert result is None # return value is None
assert Path(tmpfile.name).stat().st_size > 0 # check that outgrid exists
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
xr.testing.assert_allclose(a=temp_grid, b=expected_grid)


Expand Down
4 changes: 2 additions & 2 deletions pygmt/tests/test_grdgradient.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import pytest
import xarray as xr
from pygmt import grdgradient, load_dataarray
from pygmt import grdgradient, read
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers.testing import load_static_earth_relief
Expand Down Expand Up @@ -49,7 +49,7 @@ def test_grdgradient_outgrid(grid, expected_grid):
)
assert result is None # return value is None
assert Path(tmpfile.name).stat().st_size > 0 # check that outgrid exists
temp_grid = load_dataarray(tmpfile.name)
temp_grid = read(tmpfile.name, kind="grid")
xr.testing.assert_allclose(a=temp_grid, b=expected_grid)


Expand Down
Loading
Loading