Skip to content

Inconsistent alignments depending on whether reference file is gzipped or not #70

@arpfennig

Description

@arpfennig

Hi,

I am using mashmap v3.1.3 and I noticed that the same alignment was reported multiple times in the output file. For example, here is the output for reference chromosome 21:

ptg000091l	861542	0	861542	+	chr21	45090682	152523	1014065	13	861542	20	id:f:0.990785	kc:f:0.283563
ptg000217l	527266	0	527266	+	chr21	45090682	178973	706239	12	527266	20	id:f:0.990089	kc:f:0.331461
ptg000217l	527266	0	527266	+	chr21	45090682	178973	706239	12	527266	20	id:f:0.990089	kc:f:0.331461
ptg000203l	686331	0	686331	+	chr21	45090682	179063	865394	15	686331	25	id:f:0.996609	kc:f:0.235299
ptg000268l	916773	0	916773	+	chr21	45090682	582998	1499771	3	916773	12	id:f:0.93172	kc:f:0.0635271
ptg000268l	916773	0	916773	+	chr21	45090682	582998	1499771	3	916773	12	id:f:0.93172	kc:f:0.0635271
ptg000268l	916773	0	916773	+	chr21	45090682	582998	1499771	3	916773	12	id:f:0.93172	kc:f:0.0635271
ptg000145l	2623179	0	2623179	+	chr21	45090682	1133648	3756827	12	2623179	19	id:f:0.98662	kc:f:0.255438
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000158l	1222257	0	1222257	+	chr21	45090682	1565548	2787805	3	1222257	12	id:f:0.936192	kc:f:0.214653
ptg000160l	848270	0	848270	+	chr21	45090682	2308763	3157033	8	848270	16	id:f:0.972836	kc:f:0.583853
ptg000160l	848270	0	848270	+	chr21	45090682	2308763	3157033	8	848270	16	id:f:0.972836	kc:f:0.583853
ptg000151l	555938	0	555938	+	chr21	45090682	2459556	3015494	15	555938	22	id:f:0.993434	kc:f:1.23635
ptg000172l	1014256	0	1014256	+	chr21	45090682	2577878	3592134	10	1014256	17	id:f:0.980634	kc:f:0.597447
ptg000215l	318592	0	318592	+	chr21	45090682	5433633	5752225	19	318592	29	id:f:0.998634	kc:f:0.130072
ptg000025l	2865536	0	2865536	+	chr21	45090682	5830375	8695911	10	2865536	17	id:f:0.978886	kc:f:0.474346
ptg000176l	584981	0	584981	-	chr21	45090682	8297841	8882822	9	584981	16	id:f:0.977014	kc:f:0.599697
ptg000176l	584981	0	584981	+	chr21	45090682	8297841	8882822	9	584981	16	id:f:0.977014	kc:f:0.599697
ptg000176l	584981	0	584981	+	chr21	45090682	8297841	8882822	9	584981	16	id:f:0.977014	kc:f:0.599697
ptg000176l	584981	0	584981	+	chr21	45090682	8297841	8882822	9	584981	16	id:f:0.977014	kc:f:0.599697
ptg000093l	3796125	0	3796125	+	chr21	45090682	8542762	12338887	19	3796125	29	id:f:0.998634	kc:f:0.41517
ptg000125l	2794119	0	2794119	+	chr21	45090682	9318318	12112437	18	2794119	25	id:f:0.997158	kc:f:0.524907
ptg000290l	812527	0	812527	+	chr21	45090682	10614047	11426574	4	812527	13	id:f:0.945935	kc:f:0.117817
ptg000293l	206332	0	206332	+	chr21	45090682	10988834	11195166	10	206332	17	id:f:0.978886	kc:f:0.0164015
ptg000292l	229265	0	229265	+	chr21	45090682	11199120	11428385	11	229265	17	id:f:0.982112	kc:f:0.0476858
ptg000033l	27686851	0	27686851	+	chr21	45090682	24117979	45090681	20	27686851	255	id:f:1	kc:f:0.872569
ptg000115l	3037875	0	3037875	+	chr21	45090682	38473000	41510875	18	3037875	25	id:f:0.997158	kc:f:0.865375
ptg000147l	1146249	0	1146249	+	chr21	45090682	42104328	43250577	19	1146249	29	id:f:0.998634	kc:f:1.36605
ptg000047l	2315813	0	2315813	+	chr21	45090682	43904919	45090681	19	2315813	29	id:f:0.998634	kc:f:1.30252

I found this odd so I re-ran it. This time I happened to use an uncompressed version of my reference sequence file and I didn't get duplicated alignments, but I got some new alignments and the positions of previously found alignments changed. Here is again the output for reference chr21:

ptg000091l	861542	0	861542	+	chr21	45090682	173803	1035345	29	861542	22	id:f:0.993222	kc:f:0.33354
ptg000203l	686331	0	686331	+	chr21	45090682	179063	865394	30	686331	22	id:f:0.994209	kc:f:0.315678
ptg000145l	2623179	0	2623179	+	chr21	45090682	1170940	3794119	24	2623179	19	id:f:0.98662	kc:f:0.182862
ptg000151l	555938	0	555938	+	chr21	45090682	2499197	3055135	24	555938	19	id:f:0.98662	kc:f:0.900919
ptg000172l	1014256	0	1014256	+	chr21	45090682	2682656	3696912	27	1014256	20	id:f:0.990289	kc:f:0.642754
ptg000215l	318592	0	318592	+	chr21	45090682	5420070	5738662	38	318592	29	id:f:0.998634	kc:f:0.145981
ptg000025l	2865536	0	2865536	+	chr21	45090682	7220950	10086486	21	2865536	17	id:f:0.981403	kc:f:0.455276
ptg000176l	584981	0	584981	-	chr21	45090682	8381941	8966922	16	584981	16	id:f:0.971897	kc:f:0.557647
ptg000176l	584981	0	584981	+	chr21	45090682	8381941	8966922	16	584981	16	id:f:0.971897	kc:f:0.557647
ptg000176l	584981	0	584981	+	chr21	45090682	8381941	8966922	16	584981	16	id:f:0.971897	kc:f:0.557647
ptg000176l	584981	0	584981	+	chr21	45090682	8381941	8966922	16	584981	16	id:f:0.971897	kc:f:0.557647
ptg000125l	2794119	0	2794119	+	chr21	45090682	9374562	12168681	33	2794119	23	id:f:0.995431	kc:f:0.500231
ptg000293l	206332	0	206332	+	chr21	45090682	11091705	11298037	21	206332	17	id:f:0.980549	kc:f:0.0194184
ptg000292l	229265	0	229265	+	chr21	45090682	11199120	11428385	25	229265	19	id:f:0.986286	kc:f:0.0644872
ptg000033l	27686851	0	27686851	+	chr21	45090682	24229362	45090681	40	27686851	255	id:f:1	kc:f:1.00259
ptg000115l	3037875	0	3037875	+	chr21	45090682	39989460	43027335	37	3037875	27	id:f:0.997911	kc:f:0.951376
ptg000147l	1146249	0	1146249	+	chr21	45090682	42183625	43329874	38	1146249	29	id:f:0.998634	kc:f:1.11024
ptg000047l	2315813	0	2315813	+	chr21	45090682	43904919	45090681	38	2315813	29	id:f:0.998634	kc:f:1.22872

I used these commands:

mashmap --perc_identity 95 --noSplit -r hs1.fa.gz -q hifiasm.bp.unified.fa --threads 32 -o test1.mashmap
gunzip hs1.fa.gz
mashmap --perc_identity 95 --noSplit -r hs1.fa -q hifiasm.bp.unified.fa --threads 32 -o test.mashmap

Any ideas what could trigger such a behavior?

Thanks,
Aaron

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions