Skip to content

Conversation

Sanjaykumar030
Copy link

This PR adds a complementary example showing the column-overwriting pattern, which is both more direct and more flexible for many transformations.

Proposed Change

The original remove_columns example remains untouched. Below it, this PR introduces an alternative approach that overwrites an existing column during batch mapping.

This teaches users a core .map() capability for in-place transformations without extra intermediate steps.

New Example:

>>> from datasets import Dataset
>>> dataset = Dataset.from_dict({"a": [0, 1, 2]})
# Overwrite "a" directly to duplicate each value
>>> duplicated_dataset = dataset.map(
...     lambda batch: {"a": [x for x in batch["a"] for _ in range(2)]},
...     batched=True
... )
>>> duplicated_dataset
Dataset({
    features: ['a'],
    num_rows: 6
})
>>> duplicated_dataset["a"]
[0, 0, 1, 1, 2, 2]

@Sanjaykumar030
Copy link
Author

Hi @lhoestq, just a gentle follow-up on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant