Skip to content

Generate snapshots programmatically #10

@tisonkun

Description

@tisonkun

Yeah. I think it's technically reasonable to have snapshots for tests. The key point here is how these snapshots get generated, can we reproducibly make them? And how we can modify/update them.

In other words, how the snapshots in datasketches-go gets generated originally?

Originally posted by @tisonkun in #1 (comment)


@freakyzoidberg told me there are some "tests" which generate those files in the go repository (https://github.com/apache/datasketches-go/blob/f7bc4b1db865c2dd1be9134d8a61eeb8bc24b1c6/hll/hll_sketch_serialization_test.go#L29) I assumed it's similar for the other implementations.

Originally posted by @notfilippo in #1 (comment)


Great! Then at least we can reuse the Go logic to generate Go snapshot. But I can see that it would require extra engineer effort so I won't block this PR by such potential improvement to avoid (mysterious) binaries as much as possible.

For the Java and C++ snapshot, perhaps @leerho and @AlexanderSaydakov can give some inputs here.

Originally posted by @tisonkun in #1 (comment)


CPP / Java and Go repo do have some test that generate and cross-test the synopsis from the other repos.

It's vey much convention and quite manual - and as Lee hinted in the other thread we didn't really think about how to scale this with more language (very much M*N issue)

you can find the Java HLL x-check here and the cpp ones for ser/de there

Also worth noting that not all synopsis are guaranteed to have byte for byte equality (they'll behave the same, logically equivalent from a behavior aspect and are fully serializable/deserializable between language - but not all provide guarantee of idempotency generation when looking at raw bytes - tldr not all rng are seeded - they could I suppose though)

Originally posted by @freakyzoidberg in #1 (comment)


very much M*N issue

Not quite. Each language can implement its own serialized snapshots, and any language should leverage the existing snapshots while patching its own.

We can have a shared snapshot library like shared proto definitions in other projects.

Anyway, this is another topic, so I'll open a new issue to track it.

Originally posted by @tisonkun in #1 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions