Skip to content

Conversation

@emkornfield
Copy link
Contributor

Based on mailing list discussion try to capture semantics of type promotion/schema evolution and old writers.

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Aug 27, 2025
format/spec.md Outdated

### Schema evolution and writing with old schemas

Writers should write out all fields with the types specified from the table schema. Inserts or upserts are allowed with an outdated schema (updates must use the latest schema to avoid data loss). Column projection rules are designed so that the table will remain readable even if writers use an outdated schema in these cases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserts or upserts are allowed with an outdated schema (updates must use the latest schema to avoid data loss).

this sentence is less clear to me. Can we just say Writers are allowed with an outdated schema.?

I didn't quite get this part updates must use the latest schema to avoid data loss.

Column projection rules are designed so that the table will remain readable even if writers use an outdated schema in these cases.

Also can we switch this sentence to the one below? This way, the first paragraph is about write and the second paragraph is about read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback.

this sentence is less clear to me. Can we just say Writers are allowed with an outdated schema.?

I don't think this is a universally true statement, I added more details, PTAL.

Also can we switch this sentence to the one below? This way, the first paragraph is about write and the second paragraph is about read.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new details help. Thanks!

@emkornfield emkornfield requested a review from stevenzwu August 28, 2025 18:29
Co-authored-by: Russell Spitzer <[email protected]>
@huaxingao huaxingao merged commit c64c89f into apache:main Sep 4, 2025
2 checks passed
@huaxingao
Copy link
Contributor

Thanks @emkornfield for the PR! Thanks everyone for the review!

@RussellSpitzer
Copy link
Member

I think this counts as errata but I would recommend we do a quick dev list heads up before doing a final merged in the future.

@huaxingao
Copy link
Contributor

Thanks, @RussellSpitzer, I appreciate the note! I merged because this was framed as a clarification (not a behavior change). I should have sent a quick dev list heads-up first. I’ll follow that practice going forward. If anyone prefers we revert and run a vote, I’m happy to do that.


* For all null columns, not writing out the column would cause `initial-default` value would be applied on reading instead of `null`.
* If `write-default` has been changed then using an out-of-date schema would result in the incorrect value being populated.
* If a `write` is the result of a partial row update (e.g. `update table set col_y = 'xyz'`) an out-of-date schema would silently drop values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this? When could this happen? Is this if the column is dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an old schema is used, then you implicitly end up dropping columns because you can't read columns you don't know about. Thinking more about it, this should be unlikely to happen because you probably would have to replay the transaction anyways. But effectively the sequence would be:

  1. Writer A writes new schema with added columns and new data for the added column.
  2. Writer B uses uses and old schema (this would have to happen strictly after step 1), and reads the new data, modifying an existing column.
  3. Writer B's updates would drop the new data from the added column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants