Skip to content

Make results of full details tests diffable #88

@fengelniederhammer

Description

@fengelniederhammer

"I completely agree that as it is this diff is not very usable, but I do think we can gain some benefit from regression testing (as this is basically a regression test we just confirm that SILO behaves the same on the same datasets from what I see we don't analyze the correctness of the queries).

We use csv-diff in the pathoplexus regression tests as this gives us a human readable output of the differences between two tsv or csv files (https://github.com/pathoplexus/pathoplexus/blob/main/data-integrity-tests/regression-testing/Snakefile#L173). For example:

176 rows changed

  submissionId: AB371719.1
    clade: "outgroup" => "unassigned"

  submissionId: AB371722.1
    clade: "outgroup" => "unassigned"

I think it would be awesome to do sth similar here.
I would suggest we change the output files to tsv file (this is an output SILO can produce) and then we can store them as compressed files to save space but when comparing decompress the files. That way we can produce clear, human readable test results."

Originally posted by @anna-parker in #87 (comment)

The main issue is that the files are quite large when uncompressed (several MB or 10 MB, depending on the organism), i.e. it's reasonable to keep them compressed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions