Skip to content

Conversation

@gaoran10
Copy link
Contributor

@gaoran10 gaoran10 commented Sep 29, 2025

Motivation

Currently, the Schema ID has two different formats:

  • value schema ID (external schema ID data)
  • key-value schema ID (keySchemaId length (4 bytes) + keySchemaId data + valueSchemaId length (4 bytes) + valueSchemaId data)

This will cause a problem; if users want to use the schema ID in message metadata, they can't distinguish the value schema ID from the key-value schema ID.

Modification

Add a different magic header for the value schema ID and key-value schema ID

value schema ID

magic_byte(-1) + valueSchemaId

key-value schema ID

magic_byte(-2) + keySchemaIdLength(4 bytes) + keySchemaId + valueSchemaIdLength(4 bytes) + valueSchemaId

Breaking change

There is a potential breaking change due to the newly added magic header, but it does not affect the Kafka schema, because the newly added magic header is a negative value, and the most significant bit of binary data is 1.

For the Kafka schema, the current schema ID stored in Pulsar message metadata has two possibilities.

  • Kafka schema ID magic header (0x0, 0x1)
  • The key value schema ID length; It's a positive value, the most significant bit of binary data will not be 1.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions github-actions bot added PIP doc-not-needed Your PR changes do not impact docs labels Sep 29, 2025
@gaoran10 gaoran10 self-assigned this Sep 29, 2025
@gaoran10 gaoran10 marked this pull request as ready for review September 29, 2025 05:51
@BewareMyPower BewareMyPower changed the title [improve][PIP] Update the schema ID format [improve][client] PIP-420: Update the schema ID format Sep 29, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.27%. Comparing base (e44e084) to head (bf3d68f).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
...main/java/org/apache/pulsar/client/api/Schema.java 0.00% 2 Missing ⚠️
.../org/apache/pulsar/common/schema/SchemaIdUtil.java 93.75% 1 Missing ⚠️
...ava/org/apache/pulsar/client/impl/MessageImpl.java 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #24798       +/-   ##
=============================================
+ Coverage     38.36%   74.27%   +35.91%     
- Complexity    13171    33799    +20628     
=============================================
  Files          1854     1913       +59     
  Lines        144870   149163     +4293     
  Branches      16808    17303      +495     
=============================================
+ Hits          55574   110796    +55222     
+ Misses        81752    29533    -52219     
- Partials       7544     8834     +1290     
Flag Coverage Δ
inttests 26.42% <7.69%> (+0.20%) ⬆️
systests 22.81% <11.53%> (+0.12%) ⬆️
unittests 73.79% <84.61%> (+39.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...he/pulsar/client/impl/TypedMessageBuilderImpl.java 86.80% <100.00%> (+25.53%) ⬆️
.../org/apache/pulsar/common/schema/SchemaIdUtil.java 93.75% <93.75%> (ø)
...ava/org/apache/pulsar/client/impl/MessageImpl.java 75.43% <75.00%> (+22.05%) ⬆️
...main/java/org/apache/pulsar/client/api/Schema.java 80.00% <0.00%> (+15.00%) ⬆️

... and 1412 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gaoran10 gaoran10 merged commit 9cc15dd into apache:master Oct 9, 2025
51 checks passed
lhotari pushed a commit that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants