Skip to content

Fix warning for extra fields in read_csv with on_bad_lines callable #61885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -814,6 +814,7 @@ I/O
- Bug in :meth:`set_option` where setting the pandas option ``display.html.use_mathjax`` to ``False`` has no effect (:issue:`59884`)
- Bug in :meth:`to_csv` where ``quotechar``` is not escaped when ``escapechar`` is not None (:issue:`61407`)
- Bug in :meth:`to_excel` where :class:`MultiIndex` columns would be merged to a single row when ``merge_cells=False`` is passed (:issue:`60274`)
- Bug in :func:`read_csv` with ``engine="python"`` and callable ``on_bad_lines`` where a ``ParserWarning`` for extra fields returned by the callable was only raised when ``index_col`` was ``None``. Now the warning is consistently raised regardless of ``index_col`` (:issue:`#61837`)

Period
^^^^^^
Expand Down
12 changes: 3 additions & 9 deletions pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -614,16 +614,10 @@ def _check_data_length(
columns: list of column names
data: list of array-likes containing the data column-wise.
"""
if not self.index_col and len(columns) != len(data) and columns:
empty_str = is_object_dtype(data[-1]) and data[-1] == ""
# error: No overload variant of "__ror__" of "ndarray" matches
# argument type "ExtensionArray"
empty_str_or_na = empty_str | isna(data[-1]) # type: ignore[operator]
if len(columns) == len(data) - 1 and np.all(empty_str_or_na):
return
if columns and len(data) != len(columns):
warnings.warn(
"Length of header or names does not match length of data. This leads "
"to a loss of data with index_col=False.",
f"Length of header or names ({len(columns)}) does not match number of "
f"fields in line ({len(data)}). Extra field will be dropped.",
ParserWarning,
stacklevel=find_stack_level(),
)
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/io/parser/test_python_parser_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,24 @@ def test_malformed_skipfooter(python_parser_only):
parser.read_csv(StringIO(data), header=1, comment="#", skipfooter=1)


def test_on_bad_lines_extra_fields_warns(python_parser_only):
parser = python_parser_only
data = """id,field_1,field_2
101,A,B
102,C,D, E
103,F,G
"""

def line_fixer(_line):
return ["1", "2", "3", "4", "5"]

for index_col in [None, 0]:
with tm.assert_produces_warning(ParserWarning):
parser.read_csv(
StringIO(data), on_bad_lines=line_fixer, index_col=index_col
)


def test_python_engine_file_no_next(python_parser_only):
parser = python_parser_only

Expand Down
Loading