- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.9k
 
[Data] Add Expression Support & with_columns API #54322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            alexeykudinkin
  merged 32 commits into
  ray-project:master
from
goutamvenkat-anyscale:goutam/expressions
  
      
      
   
  Jul 12, 2025 
      
    
  
     Merged
                    Changes from 6 commits
      Commits
    
    
            Show all changes
          
          
            32 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      50a74b3
              
                Add Expression Support & with_column API
              
              
                goutamvenkat-anyscale 9a5086f
              
                Rename to use_columns, use list[expr]
              
              
                goutamvenkat-anyscale 9804716
              
                Use project operator & update doc
              
              
                goutamvenkat-anyscale 16cccff
              
                Fix linting issue
              
              
                goutamvenkat-anyscale a309598
              
                Doc linter
              
              
                goutamvenkat-anyscale 176b444
              
                doctest
              
              
                goutamvenkat-anyscale c17570f
              
                Address comment
              
              
                goutamvenkat-anyscale eea0d55
              
                Address comments
              
              
                goutamvenkat-anyscale ed251a1
              
                Linter & remove dataclass for operations
              
              
                goutamvenkat-anyscale e36a87b
              
                Address comments
              
              
                goutamvenkat-anyscale 86bc9fb
              
                revert old change
              
              
                goutamvenkat-anyscale 3053b95
              
                Remove unnecessary arg
              
              
                goutamvenkat-anyscale b09ae8f
              
                Merge branch 'master' into goutam/expressions
              
              
                goutamvenkat-anyscale fb3c6a1
              
                doctest + pytest skip if version is not met
              
              
                goutamvenkat-anyscale bd7bc77
              
                Remove circular dep
              
              
                goutamvenkat-anyscale 6d443f8
              
                Address comments
              
              
                goutamvenkat-anyscale fc034ec
              
                remove change in block builder
              
              
                goutamvenkat-anyscale 1a3941f
              
                Remove block builder change
              
              
                goutamvenkat-anyscale 3f30cbb
              
                Make pre-commit happy
              
              
                goutamvenkat-anyscale 9b8de87
              
                Address comment on Expr AST comparison
              
              
                goutamvenkat-anyscale c13f679
              
                Add expressions test to bazel build
              
              
                goutamvenkat-anyscale 8d61562
              
                Remove match expression
              
              
                goutamvenkat-anyscale d8890fd
              
                Comments
              
              
                goutamvenkat-anyscale 49e3ccb
              
                Merge branch 'master' into goutam/expressions
              
              
                goutamvenkat-anyscale 164cbd3
              
                Address comments
              
              
                goutamvenkat-anyscale b64beef
              
                Add comments back
              
              
                goutamvenkat-anyscale a3f3050
              
                Make expression classes dev api
              
              
                goutamvenkat-anyscale 821b73e
              
                Add stability to DeveloperAPIs
              
              
                goutamvenkat-anyscale f5b08eb
              
                Add .rst files
              
              
                goutamvenkat-anyscale 2741b12
              
                idk rst
              
              
                goutamvenkat-anyscale f4f620c
              
                Merge branch 'master' into goutam/expressions
              
              
                goutamvenkat-anyscale c7f0424
              
                Remove code snippet
              
              
                goutamvenkat-anyscale File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| from __future__ import annotations | ||
| 
     | 
||
| import operator | ||
| from dataclasses import dataclass | ||
| from typing import Any, Callable, Dict | ||
| 
     | 
||
| import numpy as np | ||
| import pandas as pd | ||
| import pyarrow as pa | ||
| import pyarrow.compute as pc | ||
| 
     | 
||
| from ray.util.annotations import DeveloperAPI | ||
| 
     | 
||
| 
     | 
||
| # ────────────────────────────────────── | ||
| # Basic expression node definitions | ||
| # ────────────────────────────────────── | ||
| @DeveloperAPI | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| class Expr: # Base class – all expression nodes inherit from this | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| # Binary/boolean operator overloads | ||
| def _bin(self, other: Any, op: str) -> "Expr": | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| other = other if isinstance(other, Expr) else LiteralExpr(other) | ||
| return BinaryExpr(op, self, other) | ||
| 
     | 
||
| # arithmetic | ||
| def __add__(self, other): | ||
| return self._bin(other, "add") | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| 
     | 
||
| def __sub__(self, other): | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| return self._bin(other, "sub") | ||
| 
     | 
||
| def __mul__(self, other): | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| return self._bin(other, "mul") | ||
| 
     | 
||
| def __truediv__(self, other): | ||
| return self._bin(other, "div") | ||
| 
     | 
||
| # comparison | ||
| def __gt__(self, other): | ||
| return self._bin(other, "gt") | ||
| 
     | 
||
| def __lt__(self, other): | ||
| return self._bin(other, "lt") | ||
| 
     | 
||
| def __ge__(self, other): | ||
| return self._bin(other, "ge") | ||
| 
     | 
||
| def __le__(self, other): | ||
| return self._bin(other, "le") | ||
| 
     | 
||
| def __eq__(self, other): | ||
| return self._bin(other, "eq") | ||
| 
     | 
||
| # boolean | ||
| def __and__(self, other): | ||
| return self._bin(other, "and") | ||
| 
     | 
||
| def __or__(self, other): | ||
| return self._bin(other, "or") | ||
| 
     | 
||
| # Rename the output column | ||
| def alias(self, name: str) -> "AliasExpr": | ||
| return AliasExpr(self, name) | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| class ColumnExpr(Expr): | ||
| name: str | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class LiteralExpr(Expr): | ||
| value: Any | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class BinaryExpr(Expr): | ||
| op: str | ||
| left: Expr | ||
| right: Expr | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| @dataclass(frozen=True, eq=False) | ||
| class AliasExpr(Expr): | ||
| expr: Expr | ||
| name: str | ||
| 
     | 
||
| 
     | 
||
| # ────────────────────────────────────── | ||
| # User helpers | ||
| # ────────────────────────────────────── | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| def col(name: str) -> ColumnExpr: | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| """Reference an existing column.""" | ||
| return ColumnExpr(name) | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
| def lit(value: Any) -> LiteralExpr: | ||
| """Create a scalar literal expression (e.g. lit(1)).""" | ||
| return LiteralExpr(value) | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
          
            Show resolved
            Hide resolved
         | 
||
| 
     | 
||
| 
     | 
||
| # ────────────────────────────────────── | ||
| # Local evaluator (pandas batches) | ||
| # ────────────────────────────────────── | ||
| # This is used by Dataset.with_columns – kept here so it can be re-used by | ||
| # future optimised executors. | ||
| _PANDAS_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
                
       | 
||
| "add": operator.add, | ||
| "sub": operator.sub, | ||
| "mul": operator.mul, | ||
| "div": operator.truediv, | ||
| "gt": operator.gt, | ||
| "lt": operator.lt, | ||
| "ge": operator.ge, | ||
| "le": operator.le, | ||
| "eq": operator.eq, | ||
| "and": operator.and_, | ||
| "or": operator.or_, | ||
| } | ||
| 
     | 
||
| _NUMPY_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
| "add": np.add, | ||
| "sub": np.subtract, | ||
| "mul": np.multiply, | ||
| "div": np.divide, | ||
| "gt": np.greater, | ||
| "lt": np.less, | ||
| "ge": np.greater_equal, | ||
| "le": np.less_equal, | ||
| "eq": np.equal, | ||
| "and": np.logical_and, | ||
| "or": np.logical_or, | ||
| } | ||
| 
     | 
||
| _ARROW_OPS: Dict[str, Callable[[Any, Any], Any]] = { | ||
| "add": pc.add, | ||
| "sub": pc.subtract, | ||
| "mul": pc.multiply, | ||
| "div": pc.divide, | ||
| "gt": pc.greater, | ||
| "lt": pc.less, | ||
| "ge": pc.greater_equal, | ||
| "le": pc.less_equal, | ||
| "eq": pc.equal, | ||
| "and": pc.and_, | ||
| "or": pc.or_, | ||
| } | ||
| 
     | 
||
| 
     | 
||
| def _eval_expr_recursive(expr: Expr, batch, ops: Dict[str, Callable]) -> Any: | ||
| """Generic recursive expression evaluator.""" | ||
| if isinstance(expr, ColumnExpr): | ||
| return batch[expr.name] | ||
| if isinstance(expr, LiteralExpr): | ||
| return expr.value | ||
| if isinstance(expr, BinaryExpr): | ||
| return ops[expr.op]( | ||
| _eval_expr_recursive(expr.left, batch, ops), | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| _eval_expr_recursive(expr.right, batch, ops), | ||
| ) | ||
| raise TypeError(f"Unsupported expression node: {type(expr).__name__}") | ||
| 
     | 
||
| 
     | 
||
| @DeveloperAPI | ||
                
      
                  goutamvenkat-anyscale marked this conversation as resolved.
               
              
                Outdated
          
            Show resolved
            Hide resolved
         | 
||
| def eval_expr(expr: Expr, batch) -> Any: | ||
| """Recursively evaluate *expr* against a batch of the appropriate type.""" | ||
| if isinstance(batch, pd.DataFrame): | ||
| return _eval_expr_recursive(expr, batch, _PANDAS_OPS) | ||
| elif isinstance(batch, (np.ndarray, dict)): | ||
| return _eval_expr_recursive(expr, batch, _NUMPY_OPS) | ||
| elif isinstance(batch, pa.Table): | ||
| return _eval_expr_recursive(expr, batch, _ARROW_OPS) | ||
| raise TypeError(f"Unsupported batch type: {type(batch).__name__}") | ||
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.