Skip to content

Commit 391107a

Browse files
authored
Revert "ENH: Allow third-party packages to register IO engines" (#61767)
Revert "ENH: Allow third-party packages to register IO engines (#61642)" This reverts commit 9dcce63.
1 parent 9dcce63 commit 391107a

File tree

7 files changed

+1
-352
lines changed

7 files changed

+1
-352
lines changed

doc/source/development/extending.rst

Lines changed: 0 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -489,69 +489,6 @@ registers the default "matplotlib" backend as follows.
489489
More information on how to implement a third-party plotting backend can be found at
490490
https://github.com/pandas-dev/pandas/blob/main/pandas/plotting/__init__.py#L1.
491491

492-
.. _extending.io-engines:
493-
494-
IO engines
495-
-----------
496-
497-
pandas provides several IO connectors such as :func:`read_csv` or :meth:`DataFrame.to_parquet`, and many
498-
of those support multiple engines. For example, :func:`read_csv` supports the ``python``, ``c``
499-
and ``pyarrow`` engines, each with its advantages and disadvantages, making each more appropriate
500-
for certain use cases.
501-
502-
Third-party package developers can implement engines for any of the pandas readers and writers.
503-
When a ``pandas.read_*`` function or ``DataFrame.to_*`` method are called with an ``engine="<name>"``
504-
that is not known to pandas, pandas will look into the entry points registered in the group
505-
``pandas.io_engine`` by the packages in the environment, and will call the corresponding method.
506-
507-
An engine is a simple Python class which implements one or more of the pandas readers and writers
508-
as class methods:
509-
510-
.. code-block:: python
511-
512-
class EmptyDataEngine:
513-
@classmethod
514-
def read_json(cls, path_or_buf=None, **kwargs):
515-
return pd.DataFrame()
516-
517-
@classmethod
518-
def to_json(cls, path_or_buf=None, **kwargs):
519-
with open(path_or_buf, "w") as f:
520-
f.write()
521-
522-
@classmethod
523-
def read_clipboard(cls, sep='\\s+', dtype_backend=None, **kwargs):
524-
return pd.DataFrame()
525-
526-
A single engine can support multiple readers and writers. When possible, it is a good practice for
527-
a reader to provide both a reader and writer for the supported formats. But it is possible to
528-
provide just one of them.
529-
530-
The package implementing the engine needs to create an entry point for pandas to be able to discover
531-
it. This is done in ``pyproject.toml``:
532-
533-
```toml
534-
[project.entry-points."pandas.io_engine"]
535-
empty = empty_data:EmptyDataEngine
536-
```
537-
538-
The first line should always be the same, creating the entry point in the ``pandas.io_engine`` group.
539-
In the second line, ``empty`` is the name of the engine, and ``empty_data:EmptyDataEngine`` is where
540-
to find the engine class in the package (``empty_data`` is the module name in this case).
541-
542-
If a user has the package of the example installed, them it would be possible to use:
543-
544-
.. code-block:: python
545-
546-
pd.read_json("myfile.json", engine="empty")
547-
548-
When pandas detects that no ``empty`` engine exists for the ``read_json`` reader in pandas, it will
549-
look at the entry points, will find the ``EmptyDataEngine`` engine, and will call the ``read_json``
550-
method on it with the arguments provided by the user (except the ``engine`` parameter).
551-
552-
To avoid conflicts in the names of engines, we keep an "IO engines" section in our
553-
`Ecosystem page <https://pandas.pydata.org/community/ecosystem.html#io-engines>`_.
554-
555492
.. _extending.pandas_priority:
556493

557494
Arithmetic with 3rd party types

doc/source/whatsnew/v3.0.0.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,6 @@ Other enhancements
9494
- Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)
9595
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
9696
- Support reading Stata 110-format (Stata 7) dta files (:issue:`47176`)
97-
- Third-party packages can now register engines that can be used in pandas I/O operations :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` (:issue:`61584`)
9897

9998
.. ---------------------------------------------------------------------------
10099
.. _whatsnew_300.notable_bug_fixes:

pandas/core/frame.py

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -188,10 +188,7 @@
188188
nargsort,
189189
)
190190

191-
from pandas.io.common import (
192-
allow_third_party_engines,
193-
get_handle,
194-
)
191+
from pandas.io.common import get_handle
195192
from pandas.io.formats import (
196193
console,
197194
format as fmt,
@@ -3550,7 +3547,6 @@ def to_xml(
35503547

35513548
return xml_formatter.write_output()
35523549

3553-
@allow_third_party_engines
35543550
def to_iceberg(
35553551
self,
35563552
table_identifier: str,
@@ -3560,7 +3556,6 @@ def to_iceberg(
35603556
location: str | None = None,
35613557
append: bool = False,
35623558
snapshot_properties: dict[str, str] | None = None,
3563-
engine: str | None = None,
35643559
) -> None:
35653560
"""
35663561
Write a DataFrame to an Apache Iceberg table.
@@ -3585,10 +3580,6 @@ def to_iceberg(
35853580
If ``True``, append data to the table, instead of replacing the content.
35863581
snapshot_properties : dict of {str: str}, optional
35873582
Custom properties to be added to the snapshot summary
3588-
engine : str, optional
3589-
The engine to use. Engines can be installed via third-party packages. For an
3590-
updated list of existing pandas I/O engines check the I/O engines section of
3591-
the pandas Ecosystem page.
35923583
35933584
See Also
35943585
--------

pandas/io/common.py

Lines changed: 0 additions & 152 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,13 @@
99
import codecs
1010
from collections import defaultdict
1111
from collections.abc import (
12-
Callable,
1312
Hashable,
1413
Mapping,
1514
Sequence,
1615
)
1716
import dataclasses
1817
import functools
1918
import gzip
20-
from importlib.metadata import entry_points
2119
from io import (
2220
BufferedIOBase,
2321
BytesIO,
@@ -92,10 +90,6 @@
9290

9391
from pandas import MultiIndex
9492

95-
# registry of I/O engines. It is populated the first time a non-core
96-
# pandas engine is used
97-
_io_engines: dict[str, Any] | None = None
98-
9993

10094
@dataclasses.dataclass
10195
class IOArgs:
@@ -1288,149 +1282,3 @@ def dedup_names(
12881282
counts[col] = cur_count + 1
12891283

12901284
return names
1291-
1292-
1293-
def _get_io_engine(name: str) -> Any:
1294-
"""
1295-
Return an I/O engine by its name.
1296-
1297-
pandas I/O engines can be registered via entry points. The first time this
1298-
function is called it will register all the entry points of the "pandas.io_engine"
1299-
group and cache them in the global `_io_engines` variable.
1300-
1301-
Engines are implemented as classes with the `read_<format>` and `to_<format>`
1302-
methods (classmethods) for the formats they wish to provide. This function will
1303-
return the method from the engine and format being requested.
1304-
1305-
Parameters
1306-
----------
1307-
name : str
1308-
The engine name provided by the user in `engine=<value>`.
1309-
1310-
Examples
1311-
--------
1312-
An engine is implemented with a class like:
1313-
1314-
>>> class DummyEngine:
1315-
... @classmethod
1316-
... def read_csv(cls, filepath_or_buffer, **kwargs):
1317-
... # the engine signature must match the pandas method signature
1318-
... return pd.DataFrame()
1319-
1320-
It must be registered as an entry point with the engine name:
1321-
1322-
```
1323-
[project.entry-points."pandas.io_engine"]
1324-
dummy = "pandas:io.dummy.DummyEngine"
1325-
1326-
```
1327-
1328-
Then the `read_csv` method of the engine can be used with:
1329-
1330-
>>> _get_io_engine(engine_name="dummy").read_csv("myfile.csv") # doctest: +SKIP
1331-
1332-
This is used internally to dispatch the next pandas call to the engine caller:
1333-
1334-
>>> df = read_csv("myfile.csv", engine="dummy") # doctest: +SKIP
1335-
"""
1336-
global _io_engines
1337-
1338-
if _io_engines is None:
1339-
_io_engines = {}
1340-
for entry_point in entry_points().select(group="pandas.io_engine"):
1341-
if entry_point.dist:
1342-
package_name = entry_point.dist.metadata["Name"]
1343-
else:
1344-
package_name = None
1345-
if entry_point.name in _io_engines:
1346-
_io_engines[entry_point.name]._packages.append(package_name)
1347-
else:
1348-
_io_engines[entry_point.name] = entry_point.load()
1349-
_io_engines[entry_point.name]._packages = [package_name]
1350-
1351-
try:
1352-
engine = _io_engines[name]
1353-
except KeyError as err:
1354-
raise ValueError(
1355-
f"'{name}' is not a known engine. Some engines are only available "
1356-
"after installing the package that provides them."
1357-
) from err
1358-
1359-
if len(engine._packages) > 1:
1360-
msg = (
1361-
f"The engine '{name}' has been registered by the package "
1362-
f"'{engine._packages[0]}' and will be used. "
1363-
)
1364-
if len(engine._packages) == 2:
1365-
msg += (
1366-
f"The package '{engine._packages[1]}' also tried to register "
1367-
"the engine, but it couldn't because it was already registered."
1368-
)
1369-
else:
1370-
msg += (
1371-
"The packages {str(engine._packages[1:]}[1:-1] also tried to register "
1372-
"the engine, but they couldn't because it was already registered."
1373-
)
1374-
warnings.warn(msg, RuntimeWarning, stacklevel=find_stack_level())
1375-
1376-
return engine
1377-
1378-
1379-
def allow_third_party_engines(
1380-
skip_engines: list[str] | Callable | None = None,
1381-
) -> Callable:
1382-
"""
1383-
Decorator to avoid boilerplate code when allowing readers and writers to use
1384-
third-party engines.
1385-
1386-
The decorator will introspect the function to know which format should be obtained,
1387-
and to know if it's a reader or a writer. Then it will check if the engine has been
1388-
registered, and if it has, it will dispatch the execution to the engine with the
1389-
arguments provided by the user.
1390-
1391-
Parameters
1392-
----------
1393-
skip_engines : list of str, optional
1394-
For engines that are implemented in pandas, we want to skip them for this engine
1395-
dispatching system. They should be specified in this parameter.
1396-
1397-
Examples
1398-
--------
1399-
The decorator works both with the `skip_engines` parameter, or without:
1400-
1401-
>>> class DataFrame:
1402-
... @allow_third_party_engines(["python", "c", "pyarrow"])
1403-
... def read_csv(filepath_or_buffer, **kwargs):
1404-
... pass
1405-
...
1406-
... @allow_third_party_engines
1407-
... def read_sas(filepath_or_buffer, **kwargs):
1408-
... pass
1409-
"""
1410-
1411-
def decorator(func: Callable) -> Callable:
1412-
@functools.wraps(func)
1413-
def wrapper(*args: Any, **kwargs: Any) -> Any:
1414-
if callable(skip_engines) or skip_engines is None:
1415-
skip_engine = False
1416-
else:
1417-
skip_engine = kwargs["engine"] in skip_engines
1418-
1419-
if "engine" in kwargs and not skip_engine:
1420-
engine_name = kwargs.pop("engine")
1421-
engine = _get_io_engine(engine_name)
1422-
try:
1423-
return getattr(engine, func.__name__)(*args, **kwargs)
1424-
except AttributeError as err:
1425-
raise ValueError(
1426-
f"The engine '{engine_name}' does not provide a "
1427-
f"'{func.__name__}' function"
1428-
) from err
1429-
else:
1430-
return func(*args, **kwargs)
1431-
1432-
return wrapper
1433-
1434-
if callable(skip_engines):
1435-
return decorator(skip_engines)
1436-
return decorator

pandas/io/iceberg.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,7 @@
66

77
from pandas import DataFrame
88

9-
from pandas.io.common import allow_third_party_engines
109

11-
12-
@allow_third_party_engines
1310
def read_iceberg(
1411
table_identifier: str,
1512
catalog_name: str | None = None,
@@ -21,7 +18,6 @@ def read_iceberg(
2118
snapshot_id: int | None = None,
2219
limit: int | None = None,
2320
scan_properties: dict[str, Any] | None = None,
24-
engine: str | None = None,
2521
) -> DataFrame:
2622
"""
2723
Read an Apache Iceberg table into a pandas DataFrame.
@@ -56,10 +52,6 @@ def read_iceberg(
5652
scan_properties : dict of {str: obj}, optional
5753
Additional Table properties as a dictionary of string key value pairs to use
5854
for this scan.
59-
engine : str, optional
60-
The engine to use. Engines can be installed via third-party packages. For an
61-
updated list of existing pandas I/O engines check the I/O engines section of
62-
our Ecosystem page.
6355
6456
Returns
6557
-------

0 commit comments

Comments
 (0)