Skip to content

Conversation

@Jack-LuoHongyi
Copy link

@Jack-LuoHongyi Jack-LuoHongyi commented Oct 20, 2025

Summary

  • Type: deterministic behavior (map traversal and nested serialization)
  • Scope: minimal production change; tests unchanged
  • Module: gora-hive
  • Tests: org.apache.gora.hive.store.TestHiveStore#testPutMixedMaps, org.apache.gora.hive.store.TestHiveStore#testPutNested

Root Cause
Java maps do not guarantee iteration order, and nested record field traversal can vary between runs. The write path in HiveStore serialized mixed maps and nested records using natural iteration order, which led to non-deterministic column/value ordering and occasional mismatches when validating results.

Fix
Normalize traversal during write to make output stable across runs:

  • Sort map keys before serialization to ensure a consistent order.
  • Preserve schema field order when walking nested records so nested structures are written deterministically.
    These adjustments only affect serialization order; data content, schema, and public APIs remain unchanged.

Validation

  • Local:
    • mvn -pl gora-hive -Dtest=org.apache.gora.hive.store.TestHiveStore#testPutMixedMaps test
    • mvn -pl gora-hive -Dtest=org.apache.gora.hive.store.TestHiveStore#testPutNested test
  • Order-perturbation verification: repeated runs (100x) with randomized iteration order confirm stable results. (Using edu.illinois:nondex-maven-plugin)

Risk
Low. The update stabilizes write ordering for maps and nested records without changing semantics or storage layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant