Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Dec 15, 2025

Closes #16800

We could leave some of these methods around as deprecated and make them no-ops but I'd be afraid that would create a false sense of security (compiles but behaves wrong at runtime).

@github-actions github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate catalog Related to the catalog crate datasource Changes to the datasource crate labels Dec 15, 2025
@adriangb adriangb force-pushed the delete-schema-adapter branch from e21ca2e to 7d8572a Compare December 15, 2025 21:36
@adriangb adriangb added the api change Changes the API exposed to users of the crate label Dec 15, 2025
@adriangb adriangb requested review from alamb and kosiew and removed request for alamb December 15, 2025 21:41

See the [default column values example](https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/custom_data_source/default_column_values.rs) for how to implement a custom `PhysicalExprAdapterFactory`.

### `SchemaAdapter` and `SchemaAdapterFactory` completely removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new paragraph overlaps with the previous one - ### SchemaAdapterFactory Fully Removed from Parquet
Maybe they should be merged into one ?!

println!("4. Default values from metadata are cast to proper types at planning time");
println!("5. The DefaultPhysicalExprAdapter handles other schema adaptations");
println!("\nNote: PhysicalExprAdapter is specifically for filter predicates.");
println!("For projection columns, different mechanisms handle missing columns.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be removed/edited ?
https://github.com/apache/datafusion/pull/19345/changes#diff-dd8ef704e14ac0794362f6dd9b356468e45d3ea33be15708ec2f339b5a0fdb72R67 says that projection expressions (note: expressions vs. columns) is also covered by PhysicalExprAdapter

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb

I think this PR is good enough to go. I do think it would be worth porting some of the tests as well to .slt files and @martin-g 's suggestion to consolidate the upgrade guide

But I think it is important to close out the expression rewriting changes in DataFusion 52 and get this done

}
}

/// Test reading and filtering a Parquet file where the table schema is flipped (c, b, a) vs. the physical file schema (a, b, c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these scenarios covered elsewhere? I feel like (now) we could write these all as .slt tests

Looks like some of it is covered here:

https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/schema_evolution.slt

- `SchemaMapping` struct
- `DefaultSchemaAdapterFactory` struct

These types were previously used to adapt record batch schemas during file reading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to cause non trivial pain for anyone who uses the SchemaAdapter during upgrade

However, I am not sure if leaving the code in but disconnected would be any better.

Thus I think we should go with this PR and we can help with some more writeups when we start testing the upgrade with downstream crates (like delta.rs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I leave them in, mark them as deprecated and have them raise a runtime error with a link to the upgrading guide? At least then it doesn't fail silently at runtime. You'd have to ignore the compile time warnings as well.

Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Thanks for the amazing work!

Comment on lines -170 to -190
fn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FileSource trait no longer has with_schema_adapter_factory() and schema_adapter_factory() methods.

For users, this means there's now no way for a custom FileSource to influence schema adaptation behavior at the file-source level. The only knob is at FileScanConfig / ListingTableConfig, which is downstream.

This is a capability reduction that should be called out in migration docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of a use case for these trait methods? Who is using a custom FileSource without using FileScanConfig? Why can't they attach the adapter to the concrete type, i.e. why does it have to be part of the trait? As far as I could see these methods were only being used by our own tests.

Comment on lines -100 to -102
/// # fn with_schema_adapter_factory(&self, factory: Arc<dyn SchemaAdapterFactory>) -> Result<Arc<dyn FileSource>> { Ok(Arc::new(Self {table_schema: self.table_schema.clone(), schema_adapter_factory: Some(factory)} )) }
/// # fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { self.schema_adapter_factory.clone() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking API change that is correct but needs clearer deprecation messaging. If users were relying on with_schema_adapter_factory(), they now have a compile error.

Adding a deprecated attribute that points to the upgrade guide would help users migrate.

Comment on lines -310 to -311
async fn test_parquet_flipped_projection() -> Result<()> {
// Create test data with columns (a, b, c) - the file schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a replacement end-to-end integration test for column reordering in Parquet scan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have several but I added an slt version in 2f5ff2e

Comment on lines -670 to -671
async fn test_multi_source_schema_adapter_reuse() -> Result<()> {
// This test verifies that the same schema adapter factory can be reused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for ListingTable.
A test for ListingTable would add assurance that the functionality is retained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2f5ff2e

@adriangb adriangb force-pushed the delete-schema-adapter branch from ca5b180 to a39814a Compare December 17, 2025 21:56
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 17, 2025
@adriangb adriangb force-pushed the delete-schema-adapter branch from 7f617f9 to 3c62043 Compare December 17, 2025 22:59
@adriangb
Copy link
Contributor Author

@alamb @kosiew I pushed 3c62043 which adds deprecated stubs that raise runtime errors when possible. The goal here isn't to keep the old system / code working, it's to make it more discoverable how to fix it vs. a compiler error that a method doesn't exist with not context or solution guidance.

@adriangb
Copy link
Contributor Author

I think I've addressed all of the feedback except maybe #19345 (comment). Since I've already had to resolve conflicts a couple times and it seems this would hold up a release I'd like to merge this approved PR. @kosiew is that concern a blocker or can we discuss more and address as a followup?

@adriangb adriangb force-pushed the delete-schema-adapter branch from fd0d2be to 5b667dc Compare December 18, 2025 15:50
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merged via the queue into apache:main with commit 75d2473 Dec 19, 2025
32 checks passed
@adriangb adriangb deleted the delete-schema-adapter branch December 19, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plan to replace SchemaAdapter with PhysicalExprAdapter

4 participants