-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Data] Add Expression Support & with_columns API #54322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
alexeykudinkin
merged 32 commits into
ray-project:master
from
goutamvenkat-anyscale:goutam/expressions
Jul 12, 2025
Merged
Changes from 7 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
50a74b3
Add Expression Support & with_column API
goutamvenkat-anyscale 9a5086f
Rename to use_columns, use list[expr]
goutamvenkat-anyscale 9804716
Use project operator & update doc
goutamvenkat-anyscale 16cccff
Fix linting issue
goutamvenkat-anyscale a309598
Doc linter
goutamvenkat-anyscale 176b444
doctest
goutamvenkat-anyscale c17570f
Address comment
goutamvenkat-anyscale eea0d55
Address comments
goutamvenkat-anyscale ed251a1
Linter & remove dataclass for operations
goutamvenkat-anyscale e36a87b
Address comments
goutamvenkat-anyscale 86bc9fb
revert old change
goutamvenkat-anyscale 3053b95
Remove unnecessary arg
goutamvenkat-anyscale b09ae8f
Merge branch 'master' into goutam/expressions
goutamvenkat-anyscale fb3c6a1
doctest + pytest skip if version is not met
goutamvenkat-anyscale bd7bc77
Remove circular dep
goutamvenkat-anyscale 6d443f8
Address comments
goutamvenkat-anyscale fc034ec
remove change in block builder
goutamvenkat-anyscale 1a3941f
Remove block builder change
goutamvenkat-anyscale 3f30cbb
Make pre-commit happy
goutamvenkat-anyscale 9b8de87
Address comment on Expr AST comparison
goutamvenkat-anyscale c13f679
Add expressions test to bazel build
goutamvenkat-anyscale 8d61562
Remove match expression
goutamvenkat-anyscale d8890fd
Comments
goutamvenkat-anyscale 49e3ccb
Merge branch 'master' into goutam/expressions
goutamvenkat-anyscale 164cbd3
Address comments
goutamvenkat-anyscale b64beef
Add comments back
goutamvenkat-anyscale a3f3050
Make expression classes dev api
goutamvenkat-anyscale 821b73e
Add stability to DeveloperAPIs
goutamvenkat-anyscale f5b08eb
Add .rst files
goutamvenkat-anyscale 2741b12
idk rst
goutamvenkat-anyscale f4f620c
Merge branch 'master' into goutam/expressions
goutamvenkat-anyscale c7f0424
Remove code snippet
goutamvenkat-anyscale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import operator | ||
| from dataclasses import dataclass | ||
| from typing import Any, Callable, Dict | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
| import pyarrow.compute as pc | ||
|
|
||
| from ray.util.annotations import DeveloperAPI | ||
|
|
||
|
|
||
| # ────────────────────────────────────── | ||
| # Basic expression node definitions | ||
| # ────────────────────────────────────── | ||
| @DeveloperAPI | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| class Expr: # Base class – all expression nodes inherit from this | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| # Binary/boolean operator overloads | ||
| def _bin(self, other: Any, op: str) -> "Expr": | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| other = other if isinstance(other, Expr) else LiteralExpr(other) | ||
| return BinaryExpr(op, self, other) | ||
|
|
||
| # arithmetic | ||
| def __add__(self, other): | ||
| return self._bin(other, "add") | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| def __sub__(self, other): | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return self._bin(other, "sub") | ||
|
|
||
| def __mul__(self, other): | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return self._bin(other, "mul") | ||
|
|
||
| def __truediv__(self, other): | ||
| return self._bin(other, "div") | ||
|
|
||
| # comparison | ||
| def __gt__(self, other): | ||
| return self._bin(other, "gt") | ||
|
|
||
| def __lt__(self, other): | ||
| return self._bin(other, "lt") | ||
|
|
||
| def __ge__(self, other): | ||
| return self._bin(other, "ge") | ||
|
|
||
| def __le__(self, other): | ||
| return self._bin(other, "le") | ||
|
|
||
| def __eq__(self, other): | ||
| return self._bin(other, "eq") | ||
|
|
||
| # boolean | ||
| def __and__(self, other): | ||
| return self._bin(other, "and") | ||
|
|
||
| def __or__(self, other): | ||
| return self._bin(other, "or") | ||
|
|
||
| # Rename the output column | ||
| def alias(self, name: str) -> "AliasExpr": | ||
| return AliasExpr(self, name) | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
goutamvenkat-anyscale marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| class ColumnExpr(Expr): | ||
| name: str | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class LiteralExpr(Expr): | ||
| value: Any | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class BinaryExpr(Expr): | ||
| op: str | ||
| left: Expr | ||
| right: Expr | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class AliasExpr(Expr): | ||
| expr: Expr | ||
| name: str | ||
|
|
||
|
|
||
| # ────────────────────────────────────── | ||
| # User helpers | ||
| # ────────────────────────────────────── | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| def col(name: str) -> ColumnExpr: | ||
goutamvenkat-anyscale marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """Reference an existing column.""" | ||
| return ColumnExpr(name) | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
| def lit(value: Any) -> LiteralExpr: | ||
| """Create a scalar literal expression (e.g. lit(1)).""" | ||
| return LiteralExpr(value) | ||
goutamvenkat-anyscale marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| # ────────────────────────────────────── | ||
| # Local evaluator (pandas batches) | ||
| # ────────────────────────────────────── | ||
| # This is used by Dataset.with_columns – kept here so it can be re-used by | ||
| # future optimised executors. | ||
| _PANDAS_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
|
||
| "add": operator.add, | ||
| "sub": operator.sub, | ||
| "mul": operator.mul, | ||
| "div": operator.truediv, | ||
| "gt": operator.gt, | ||
| "lt": operator.lt, | ||
| "ge": operator.ge, | ||
| "le": operator.le, | ||
| "eq": operator.eq, | ||
| "and": operator.and_, | ||
| "or": operator.or_, | ||
| } | ||
|
|
||
| _NUMPY_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
| "add": np.add, | ||
| "sub": np.subtract, | ||
| "mul": np.multiply, | ||
| "div": np.divide, | ||
| "gt": np.greater, | ||
| "lt": np.less, | ||
| "ge": np.greater_equal, | ||
| "le": np.less_equal, | ||
| "eq": np.equal, | ||
| "and": np.logical_and, | ||
| "or": np.logical_or, | ||
| } | ||
|
|
||
| _ARROW_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
| "add": pc.add, | ||
| "sub": pc.subtract, | ||
| "mul": pc.multiply, | ||
| "div": pc.divide, | ||
| "gt": pc.greater, | ||
| "lt": pc.less, | ||
| "ge": pc.greater_equal, | ||
| "le": pc.less_equal, | ||
| "eq": pc.equal, | ||
| "and": pc.and_, | ||
| "or": pc.or_, | ||
| } | ||
|
|
||
|
|
||
| def _eval_expr_recursive(expr: Expr, batch, ops: Dict[str, Callable]) -> Any: | ||
| """Generic recursive expression evaluator.""" | ||
| if isinstance(expr, ColumnExpr): | ||
| return batch[expr.name] | ||
| if isinstance(expr, LiteralExpr): | ||
| return expr.value | ||
| if isinstance(expr, BinaryExpr): | ||
| return ops[expr.op]( | ||
| _eval_expr_recursive(expr.left, batch, ops), | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| _eval_expr_recursive(expr.right, batch, ops), | ||
| ) | ||
| raise TypeError(f"Unsupported expression node: {type(expr).__name__}") | ||
|
|
||
|
|
||
| @DeveloperAPI | ||
goutamvenkat-anyscale marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| def eval_expr(expr: Expr, batch) -> Any: | ||
| """Recursively evaluate *expr* against a batch of the appropriate type.""" | ||
| if isinstance(batch, pd.DataFrame): | ||
| return _eval_expr_recursive(expr, batch, _PANDAS_OPS) | ||
| elif isinstance(batch, (np.ndarray, dict)): | ||
| return _eval_expr_recursive(expr, batch, _NUMPY_OPS) | ||
| elif isinstance(batch, pa.Table): | ||
| return _eval_expr_recursive(expr, batch, _ARROW_OPS) | ||
| raise TypeError(f"Unsupported batch type: {type(batch).__name__}") | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.