Skip to content

Conversation

@tokoko
Copy link
Contributor

@tokoko tokoko commented Oct 23, 2025

Adds minimal setup for narwhals-complaint dataframe (lazyframe).

@tokoko
Copy link
Contributor Author

tokoko commented Oct 23, 2025

  • I have previously tried implementing subframe which was supposed to be lighter-weight ibis clone, but trying to mimic ibis can be tricky at times and I'm not sure would be all that beneficial either.
  • Then I did another draft PR here (feat: dataframe api #70) trying to build out our own flavor of dataframe api (plus Bryce also has a similar project substrait-dataframe). The problem with both is that devising your own dataframe api can turn incredibly creative at times and more creative we are, the less people will actually use anything we do.

This is yet another attempt to bring dataframe api to substrait-python, also trying to bring the best of both worlds from previous approaches. PR adds DataFrame class that is compliant to narwhals LazyFrame API. narwhals is a lightweight datarame api layer that specifies the api and handles function routing. It's basically odbc (or should i say adbc) for dataframes, just the protocol and nothing else. That makes building a narwhals-compliant dataframe a lot simpler.

Another upside of this approach is that we technically don't have to depend on narwhals at all, the api can be used either natively (import substrait.dataframe as sdf as namespace) or with narwhals (import narwhals as nw).

From implementation perspective, dataframe should also be a very lightweight glue code that delegates all the heavy lifting to plan and extended expression builders from substrait.builders.

@MarcoGorelli
Copy link

thanks for your interest!

we're revisiting the extensions/plugins mechanism, and my hope is that in a few months it will be significantly easier to build and test these - if it feels hard and confusing at the moment, there is light at the end of the tunnel!

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR adds DataFrame class that is compliant to narwhals LazyFrame API. narwhals is a lightweight datarame api layer that specifies the api and handles function routing. It's basically odbc (or should i say adbc) for dataframes, just the protocol and nothing else. That makes building a narwhals-compliant dataframe a lot simpler.

Another upside of this approach is that we technically don't have to depend on narwhals at all, the api can be used either natively (import substrait.dataframe as sdf as namespace) or with narwhals (import narwhals as nw).

From implementation perspective, dataframe should also be a very lightweight glue code that delegates all the heavy lifting to plan and extended expression builders from substrait.builders.

This all seems great, and I think starting minimal and expanding support makes a ton of sense. Generally happy to merge as is, but if you could include a bit more context on the TODOs you've left I would appreciate it. I suspect you'll be the one to handling them anyways, but it's good not to assume.

expressions = [e.expr for e in exprs] + [
expr.alias(alias).expr for alias, expr in named_exprs.items()
]
return DataFrame(project(self.plan, expressions=expressions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure to update this to use the new select builder once your work in #125 is merged in.

return self

def abs(self):
self.expr = scalar_function("functions_arithmetic.yaml", "abs", expressions=[self.expr])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this PR will get merged before your work in #107. We'll need to make sure to update this.

]
return DataFrame(project(self.plan, expressions=expressions))

# TODO handle version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's needs to exist to handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no clue at this point, I need to dig deeper to understand if we can disregard it or not. that's what I meant by "handle version".

@MarcoGorelli
Copy link

MarcoGorelli commented Dec 4, 2025

thanks for your interest!

we're revisiting the extensions/plugins mechanism, and my hope is that in a few months it will be significantly easier to build and test these - if it feels hard and confusing at the moment, there is light at the end of the tunnel!

just to follow up on this, we've done this, and https://github.com/narwhals-dev/narwhals-daft should hopefully now serve as a useful blueprint!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants