Skip to content

[ENH] Add polars Engine Support to pd.read_csv() #61988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

abujabarmubarak
Copy link

πŸš€ Pull Request: [ENH] Add polars Engine Support to pd.read_csv()


❓ Problem Statement

Pandas' read_csv() function supports multiple engines like "c", "python", and "pyarrow" for reading CSV files. However, there is no built-in support for the high-performance Polars engine, which is known for its speed and efficiency in parsing large datasets.

βœ… Community Request: Feature proposed in Issue #61813


πŸ› οΈ Solution & What’s Included

This PR implements optional support for engine="polars" in pandas.read_csv() by:

  1. Modifying readers.py:

    • Checks if engine is "polars".
    • Dynamically imports Polars and uses pl.read_csv(...).to_pandas() to return a pandas DataFrame.
    if kwds.get("engine") == "polars":
        try:
            import polars as pl  # type: ignore[import-untyped]
        except ImportError:
            raise ImportError("Polars is not installed. Please install it with 'pip install polars'.")
        df = pl.read_csv(filepath_or_buffer, **kwds).to_pandas()
        return df
  2. Ensuring compatibility in engine validation:

    if engine not in ("c", "python", "pyarrow", "polars"):
        raise ValueError(f"Unknown engine: {engine}")
  3. Version Updates:

    • Updated version to 2.3.3.dev0 in:
      • __init__.py
      • pyproject.toml
  4. Testing:

    • Added a dedicated test: pandas/tests/io/parser/test_read_csv_polars.py

πŸ’‘ Example Usage

import pandas as pd

df = pd.read_csv("sample.csv", engine="polars")
print(df)

Input file: sample.csv

a,b
1,2
3,4

🎯 Expected Output

   a  b
0  1  2
1  3  4
  • The file is parsed using Polars under the hood and returned as a pandas.DataFrame.
  • Performance benefits without changing the Pandas API.
  • Optional: only active if polars is installed.

πŸ“‚ Files Modified

  • pandas/io/parsers/readers.py β†’ Add polars engine logic
  • pandas/__init__.py β†’ Version bump to 2.3.3.dev0
  • pyproject.toml β†’ Version update
  • pandas/tests/io/parser/test_read_csv_polars.py β†’ New test file added

πŸ§ͺ Tests

Test name: test_read_csv_with_polars

def test_read_csv_with_polars(tmp_path):
    pl = pytest.importorskip("polars")
    pd = pytest.importorskip("pandas")

    file = tmp_path / "sample.csv"
    file.write_text("a,b\n1,2\n3,4")

    df = pd.read_csv(file, engine="polars")

    assert df.equals(pd.DataFrame({"a": [1, 3], "b": [2, 4]}))

βœ… Result: Passed with warning (unrelated deprecation from pyarrow)


🧷 Notes

  • Falls back to error if Polars is not installed.
  • This is a non-breaking enhancement and does not affect existing functionality.
  • Future expansion possible to support write or more Polars features.

πŸ” Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant