Remove SchemaAdapter #19345

adriangb · 2025-12-15T21:24:47Z

We could leave some of these methods around as deprecated and make them no-ops but I'd be afraid that would create a false sense of security (compiles but behaves wrong at runtime).

martin-g · 2025-12-16T10:11:32Z

docs/source/library-user-guide/upgrading.md


 See the [default column values example](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/custom_data_source/default_column_values.rs) for how to implement a custom `PhysicalExprAdapterFactory`.

+### `SchemaAdapter` and `SchemaAdapterFactory` completely removed


This new paragraph overlaps with the previous one - ### SchemaAdapterFactory Fully Removed from Parquet
Maybe they should be merged into one ?!

martin-g · 2025-12-16T10:15:06Z

datafusion-examples/examples/custom_data_source/default_column_values.rs

    println!("4. Default values from metadata are cast to proper types at planning time");
    println!("5. The DefaultPhysicalExprAdapter handles other schema adaptations");
    println!("\nNote: PhysicalExprAdapter is specifically for filter predicates.");
    println!("For projection columns, different mechanisms handle missing columns.");


Should this line be removed/edited ?
https://github.com/apache/datafusion/pull/19345/changes#diff-dd8ef704e14ac0794362f6dd9b356468e45d3ea33be15708ec2f339b5a0fdb72R67 says that projection expressions (note: expressions vs. columns) is also covered by PhysicalExprAdapter

alamb

Thank you @adriangb

I think this PR is good enough to go. I do think it would be worth porting some of the tests as well to .slt files and @martin-g 's suggestion to consolidate the upgrade guide

But I think it is important to close out the expression rewriting changes in DataFusion 52 and get this done

alamb · 2025-12-16T17:03:20Z

datafusion/core/tests/schema_adapter/schema_adapter_integration_tests.rs

-    }
-}
-
-/// Test reading and filtering a Parquet file where the table schema is flipped (c, b, a) vs. the physical file schema (a, b, c)


are these scenarios covered elsewhere? I feel like (now) we could write these all as .slt tests

Looks like some of it is covered here:

https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/schema_evolution.slt

alamb · 2025-12-16T17:09:19Z

docs/source/library-user-guide/upgrading.md

+- `SchemaMapping` struct
+- `DefaultSchemaAdapterFactory` struct
+
+These types were previously used to adapt record batch schemas during file reading.


This is likely to cause non trivial pain for anyone who uses the SchemaAdapter during upgrade

However, I am not sure if leaving the code in but disconnected would be any better.

Thus I think we should go with this PR and we can help with some more writeups when we start testing the upgrade with downstream crates (like delta.rs)

How about I leave them in, mark them as deprecated and have them raise a runtime error with a link to the upgrading guide? At least then it doesn't fail silently at runtime. You'd have to ignore the compile time warnings as well.

kosiew

Thanks for the amazing work!

kosiew · 2025-12-17T11:45:04Z

datafusion/datasource/src/file.rs

-    fn with_schema_adapter_factory(
-        &self,
-        _factory: Arc<dyn SchemaAdapterFactory>,
-    ) -> Result<Arc<dyn FileSource>> {


The FileSource trait no longer has with_schema_adapter_factory() and schema_adapter_factory() methods.

For users, this means there's now no way for a custom FileSource to influence schema adaptation behavior at the file-source level. The only knob is at FileScanConfig / ListingTableConfig, which is downstream.

This is a capability reduction that should be called out in migration docs.

Can you give an example of a use case for these trait methods? Who is using a custom FileSource without using FileScanConfig? Why can't they attach the adapter to the concrete type, i.e. why does it have to be part of the trait? As far as I could see these methods were only being used by our own tests.

kosiew · 2025-12-17T11:48:52Z

datafusion/datasource/src/file_scan_config.rs

-/// #  fn with_schema_adapter_factory(&self, factory: Arc<dyn SchemaAdapterFactory>) -> Result<Arc<dyn FileSource>> { Ok(Arc::new(Self {table_schema: self.table_schema.clone(), schema_adapter_factory: Some(factory)} )) }
-/// #  fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { self.schema_adapter_factory.clone() }


This is a breaking API change that is correct but needs clearer deprecation messaging. If users were relying on with_schema_adapter_factory(), they now have a compile error.

Adding a deprecated attribute that points to the upgrade guide would help users migrate.

kosiew · 2025-12-17T12:12:38Z

datafusion/core/tests/schema_adapter/schema_adapter_integration_tests.rs

-async fn test_parquet_flipped_projection() -> Result<()> {
-    // Create test data with columns (a, b, c) - the file schema


Do we have a replacement end-to-end integration test for column reordering in Parquet scan?

I believe we have several but I added an slt version in 2f5ff2e

kosiew · 2025-12-17T12:22:21Z

datafusion/core/tests/schema_adapter/schema_adapter_integration_tests.rs

-async fn test_multi_source_schema_adapter_reuse() -> Result<()> {
-    // This test verifies that the same schema adapter factory can be reused


This is important for ListingTable.
A test for ListingTable would add assurance that the functionality is retained.

Added in 2f5ff2e

adriangb · 2025-12-17T23:01:41Z

@alamb @kosiew I pushed 3c62043 which adds deprecated stubs that raise runtime errors when possible. The goal here isn't to keep the old system / code working, it's to make it more discoverable how to fix it vs. a compiler error that a method doesn't exist with not context or solution guidance.

adriangb · 2025-12-17T23:15:54Z

I think I've addressed all of the feedback except maybe #19345 (comment). Since I've already had to resolve conflicts a couple times and it seems this would hold up a release I'd like to merge this approved PR. @kosiew is that concern a blocker or can we discuss more and address as a followup?

kosiew

LGTM

github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate catalog Related to the catalog crate datasource Changes to the datasource crate labels Dec 15, 2025

adriangb force-pushed the delete-schema-adapter branch from e21ca2e to 7d8572a Compare December 15, 2025 21:36

adriangb added the api change Changes the API exposed to users of the crate label Dec 15, 2025

adriangb requested review from alamb and kosiew and removed request for alamb December 15, 2025 21:41

martin-g reviewed Dec 16, 2025

View reviewed changes

alamb approved these changes Dec 16, 2025

View reviewed changes

kosiew reviewed Dec 17, 2025

View reviewed changes

adriangb force-pushed the delete-schema-adapter branch from ca5b180 to a39814a Compare December 17, 2025 21:56

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 17, 2025

adriangb force-pushed the delete-schema-adapter branch from 7f617f9 to 3c62043 Compare December 17, 2025 22:59

adriangb added 7 commits December 18, 2025 09:50

Remove SchemaAdapter

4dc79c1

remove overlap in upgrading guide

8baa453

remove outdated note

9e66e50

add more tests

cd03329

fix merge

8dc4130

add deprecated skeletons

1fee409

lint

5b667dc

adriangb force-pushed the delete-schema-adapter branch from fd0d2be to 5b667dc Compare December 18, 2025 15:50

kosiew approved these changes Dec 19, 2025

View reviewed changes

kosiew mentioned this pull request Dec 19, 2025

SchemaMapping.map_column_statistics produce column_statistics mismatch. #19096

Closed

adriangb added this pull request to the merge queue Dec 19, 2025

Merged via the queue into apache:main with commit 75d2473 Dec 19, 2025
32 checks passed

adriangb deleted the delete-schema-adapter branch December 19, 2025 14:17


		See the [default column values example](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/custom_data_source/default_column_values.rs) for how to implement a custom `PhysicalExprAdapterFactory`.

		### `SchemaAdapter` and `SchemaAdapterFactory` completely removed

		/// # fn with_schema_adapter_factory(&self, factory: Arc<dyn SchemaAdapterFactory>) -> Result<Arc<dyn FileSource>> { Ok(Arc::new(Self {table_schema: self.table_schema.clone(), schema_adapter_factory: Some(factory)} )) }
		/// # fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { self.schema_adapter_factory.clone() }

		async fn test_parquet_flipped_projection() -> Result<()> {
		// Create test data with columns (a, b, c) - the file schema

		async fn test_multi_source_schema_adapter_reuse() -> Result<()> {
		// This test verifies that the same schema adapter factory can be reused

Remove SchemaAdapter #19345

Remove SchemaAdapter #19345

Uh oh!

Conversation

adriangb commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Dec 17, 2025

Uh oh!

adriangb commented Dec 17, 2025

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adriangb commented Dec 15, 2025 •

edited

Loading