-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Yeah. I think it's technically reasonable to have snapshots for tests. The key point here is how these snapshots get generated, can we reproducibly make them? And how we can modify/update them.
In other words, how the snapshots in datasketches-go gets generated originally?
Originally posted by @tisonkun in #1 (comment)
@freakyzoidberg told me there are some "tests" which generate those files in the go repository (https://github.com/apache/datasketches-go/blob/f7bc4b1db865c2dd1be9134d8a61eeb8bc24b1c6/hll/hll_sketch_serialization_test.go#L29) I assumed it's similar for the other implementations.
Originally posted by @notfilippo in #1 (comment)
Great! Then at least we can reuse the Go logic to generate Go snapshot. But I can see that it would require extra engineer effort so I won't block this PR by such potential improvement to avoid (mysterious) binaries as much as possible.
For the Java and C++ snapshot, perhaps @leerho and @AlexanderSaydakov can give some inputs here.
Originally posted by @tisonkun in #1 (comment)
CPP / Java and Go repo do have some test that generate and cross-test the synopsis from the other repos.
It's vey much convention and quite manual - and as Lee hinted in the other thread we didn't really think about how to scale this with more language (very much M*N issue)
you can find the Java HLL x-check here and the cpp ones for ser/de there
Also worth noting that not all synopsis are guaranteed to have byte for byte equality (they'll behave the same, logically equivalent from a behavior aspect and are fully serializable/deserializable between language - but not all provide guarantee of idempotency generation when looking at raw bytes - tldr not all rng are seeded - they could I suppose though)
Originally posted by @freakyzoidberg in #1 (comment)
very much M*N issue
Not quite. Each language can implement its own serialized snapshots, and any language should leverage the existing snapshots while patching its own.
We can have a shared snapshot library like shared proto definitions in other projects.
Anyway, this is another topic, so I'll open a new issue to track it.
Originally posted by @tisonkun in #1 (comment)