Skip to content

Conversation

@PsiACE
Copy link
Member

@PsiACE PsiACE commented Dec 22, 2025

I just found that we've implemented murmur3hash ourselves, while Java also implements xxhash.

  • Ported xxhash64 (seeded) to match DataSketches Java behavior.

Copilot AI review requested due to automatic review settings December 22, 2025 14:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an xxhash64 implementation to the hash module, matching the DataSketches Java behavior. The implementation provides a fast, non-cryptographic 64-bit hash function with seeded hashing support.

  • Implements XxHash64 struct with the standard Hasher trait
  • Provides both streaming API (write/finish64) and convenience method (hash_u64)
  • Includes comprehensive test coverage with reference test vectors for verification

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/hash/xxhash.rs New complete implementation of xxhash64 with constants, hasher struct, trait implementations, helper functions, and test suite
src/hash/mod.rs Adds module declaration and public export for XxHash64 (with unused_imports attribute as hash module is currently internal-only)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@notfilippo
Copy link
Contributor

I just found that we've implemented murmur3hash ourselves, while Java also implements xxhash.

I think we should wait for sketches that need xxhash before shipping it ourselves. cc @tisonkun

@tisonkun
Copy link
Member

I just found that we've implemented murmur3hash ourselves, while Java also implements xxhash.

I think we should wait for sketches that need xxhash before shipping it ourselves. cc @tisonkun

Agree. We can hold this PR for now.

Whether or not publish the hash function is still under discussion, somehow datasketches-java and dataskecthes-cpp use them internally and datasketches-go clearly define the hash function internally.

Seems mainly BloomFilter uses XxHash. @PsiACE you can take a look at #5 to see where we can have those sketches and xxhash as part of its implementation.

@notfilippo
Copy link
Contributor

I am working on a implementation of bloom filters at the moment. I can rework it to use this PR.

@tisonkun
Copy link
Member

I am working on a implementation of bloom filters at the moment. I can rework it to use this PR.

Great! When it is desired, you can use Co-Authored-By: ... to convey credit of collaboration.

See https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

Signed-off-by: Chojan Shang <[email protected]>
@leerho leerho merged commit f22231e into apache:main Dec 25, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants