Skip to content

Commit 73eee10

Browse files
committed
Expand hts_parse_region documentation in the header file.
Mainly copied from the useful (but not so visible) comment before the function definition in hts.c.
1 parent 0eac47d commit 73eee10

File tree

2 files changed

+53
-2
lines changed

2 files changed

+53
-2
lines changed

htslib/faidx.h

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,10 @@ faidx_t *fai_load_format(const char *fn, enum fai_format_options format);
164164
165165
The returned sequence is allocated by `malloc()` family and should be destroyed
166166
by end users by calling `free()` on it.
167+
168+
To work around ambiguous parsing issues, eg both "chr1" and "chr1:100-200"
169+
are reference names, quote using curly braces.
170+
Thus "{chr1}:100-200" and "{chr1:100-200}" disambiguate the above example.
167171
*/
168172
char *fai_fetch(const faidx_t *fai, const char *reg, int *len);
169173

@@ -173,8 +177,10 @@ char *fai_fetch(const faidx_t *fai, const char *reg, int *len);
173177
@param len Length of the region; -2 if seq not present, -1 general error
174178
@return Pointer to the quality string; null on failure
175179
176-
The returned quality string is allocated by `malloc()` family and should be destroyed
177-
by end users by calling `free()` on it.
180+
The returned quality string is allocated by `malloc()` family and should be
181+
destroyed by end users by calling `free()` on it.
182+
183+
Region names can be quoted with curly braces, as for fai_fetch().
178184
*/
179185
char *fai_fetchqual(const faidx_t *fai, const char *reg, int *len);
180186

@@ -234,6 +240,10 @@ int faidx_seq_len(const faidx_t *fai, const char *seq);
234240
@param end Returns the one past last of the region (0 based)
235241
@param flags Parsing method, see HTS_PARSE_* in hts.h.
236242
@return pointer to end of parsed s if successs, NULL if not.
243+
244+
To work around ambiguous parsing issues, eg both "chr1" and "chr1:100-200"
245+
are reference names, quote using curly braces.
246+
Thus "{chr1}:100-200" and "{chr1:100-200}" disambiguate the above example.
237247
*/
238248
const char *fai_parse_region(const faidx_t *fai, const char *s, int *tid, int64_t *beg, int64_t *end, int flags);
239249

htslib/hts.h

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,47 @@ const char *hts_parse_reg(const char *str, int *beg, int *end);
810810
@return Pointer to the byte after the end of the entire region
811811
specifier (including any trailing comma) on success,
812812
or NULL if @a str could not be parsed.
813+
814+
A variant of hts_parse_reg which is reference-id aware. It uses
815+
the iterator name2id callbacks to validate the region tokenisation works.
816+
817+
This is necessary due to GRCh38 HLA additions which have reference names
818+
like "HLA-DRB1*12:17".
819+
820+
To work around ambiguous parsing issues, eg both "chr1" and "chr1:100-200"
821+
are reference names, quote using curly braces.
822+
Thus "{chr1}:100-200" and "{chr1:100-200}" disambiguate the above example.
823+
824+
Flags are used to control how parsing works, and can be one of the below.
825+
826+
HTS_PARSE_THOUSANDS_SEP:
827+
Ignore commas in numbers. For example with this flag 1,234,567
828+
is interpreted as 1234567.
829+
830+
HTS_PARSE_LIST:
831+
If present, the region is assmed to be a comma separated list and
832+
position parsing will not contain commas (this implicitly
833+
clears HTS_PARSE_THOUSANDS_SEP in the call to hts_parse_decimal).
834+
On success the return pointer will be the start of the next region, ie
835+
the character after the comma. (If *ret != '\0' then the caller can
836+
assume another region is present in the list.)
837+
838+
If not set then positions may contain commas. In this case the return
839+
value should point to the end of the string, or NULL on failure.
840+
841+
HTS_PARSE_ONE_COORD:
842+
If present, X:100 is treated as the single base pair region X:100-100.
843+
In this case X:-100 is shorthand for X:1-100 and X:100- is X:100-<end>.
844+
(This is the standard bcftools region convention.)
845+
846+
When not set X:100 is considered to be X:100-<end> where <end> is
847+
the end of chromosome X (set to INT_MAX here). X:100- and X:-100 are
848+
invalid.
849+
(This is the standard samtools region convention.)
850+
851+
Note the supplied string expects 1 based inclusive coordinates, but the
852+
returned coordinates start from 0 and are half open, so pos0 is valid
853+
for use in e.g. "for (pos0 = beg; pos0 < end; pos0++) {...}"
813854
*/
814855
const char *hts_parse_region(const char *str, int *tid, int64_t *beg, int64_t *end,
815856
hts_name2id_f getid, void *hdr, int flags);

0 commit comments

Comments
 (0)