Skip to content

Investigate performance with big k-mer sizes (e.g. 5,001) #22

@fedarko

Description

@fedarko

Example: when creating the dot plot of two ~5 Mbp genomes using k = 5,001, the common substrings method runs out of memory (despite it handling k = 33 for the same sequences fine). Not sure why exactly.

However, the "suff-only" method can actually handle this okay.

It would be nice to have some guidance on why big k-mer sizes cause problems, and how to handle them. I am not sure if anyone is out here regularly creating dot plots with k-mer sizes in the thousands and up, but apparently this tool can do that at least :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    adminAdministrative (mostly non-code-related)backburnerNot as brutal a designation as "wontfix", but...documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions