fastANI's .visualize output for small sequences 

Dear fastANI team

I'm trying to visualize the fastANI output for a simple case where two fasta files have identical sequences (with different record IDs).  In each fasta file, there are two random sequences generated using [this website](http://www.faculty.ucr.edu/~mmaduro/random.htm) with the length of 800 and 1100 bases. The visualisation results doesn't seem to be as expected. 

This is the command line I used:
```
fastANI -q 1.fa -r 2.fa --visualize -o fastani.out --minFraction 0.1 --fragLen 1000
```
which resulted in these outputs
```
$ cat fastani.out
1.fa	2.fa	100	1	1
$ cat fastani.out.visual 
1.fa	2.fa	100	NA	NA	NA	0	999	800	1799	NA	NA
```
So the ANI value is correctly estimated 100.
Then I used [this R code](https://github.com/ParBLiSS/FastANI/blob/master/scripts/visualize.R), generating this figure

![fastani out visual](https://github.com/ParBLiSS/FastANI/assets/41860044/5dfd94a4-28f6-42a9-b8a3-3f346d4a9c7d)

I would have expected to see a different visualisation where the first record (with 800 bases) is ignored, and there is a match for the second record (with 1100 bases). 

When I check the R code, 
```
comparison <- try(read_comparison_from_blast(fastANI_visual_file))
```
The output visual file is converted to this R dataframe:
<img width="1049" alt="Screenshot 2024-06-10 at 7 07 41 PM" src="https://github.com/ParBLiSS/FastANI/assets/41860044/10bcc61f-8f2a-4a74-8ae8-53bdd10bbb74">

So I can guess  `0	999`  (as it is evident in the image), shows the start and end on the first fasta file, and also  `800	1799` shows the start and end on the second sequences. It seems that coordinate of FastANI `.visual` output  are a concatenation.  

Please find attached the dataset [fastani_case.zip](https://github.com/user-attachments/files/15780014/fastani_case.zip).

It seems that it happens when the first sequence (record) in the fasta file is smaller than fragLen. This doesn't happened for another dataset with larger sequences. I agree this is very edge case and might not happen for usual real dataset.

Best regards,
Sina

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fastANI's .visualize output for small sequences #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fastANI's .visualize output for small sequences #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions