Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
da3eb68
improve sex chrom seg and error matrix
AprilYUZhang Apr 16, 2025
60eb324
add mutation rate
AprilYUZhang Apr 17, 2025
b645cff
update output file related to sex chrom
AprilYUZhang Apr 21, 2025
8e543d8
remove doubleIfNotMissing
AprilYUZhang Apr 21, 2025
c907c31
update the usage docs
AprilYUZhang Apr 21, 2025
bac8a42
add file for testing sex chromomsome
AprilYUZhang Apr 22, 2025
57fd648
update function test for sex chr and precommit
AprilYUZhang Apr 22, 2025
00bce13
update other functions for specific penetrance #189
AprilYUZhang Apr 22, 2025
18ef01f
debug syntex
AprilYUZhang Apr 22, 2025
efc75ab
add function test in case of no recom,with recom,no missing, with mis…
AprilYUZhang May 9, 2025
65c229e
test accuracy
AprilYUZhang May 10, 2025
2eeb37d
fix run_accu_test
AprilYUZhang May 10, 2025
395384f
fix usage x_chr #184
AprilYUZhang May 16, 2025
51d5992
change all sex chrom to x chr/XChr
AprilYUZhang May 16, 2025
151011e
change ProbMath.generateSegregation_mu to ProbMath.generateSegregatio…
AprilYUZhang May 16, 2025
532d14b
fix accu_test x_chr
AprilYUZhang May 16, 2025
666bf82
fix run_func_test x_chr
AprilYUZhang May 16, 2025
394f497
is not found
AprilYUZhang May 16, 2025
5abdcdd
Keep consistent on x_chr flag
AprilYUZhang Oct 2, 2025
09a4773
add a conditional XChr
AprilYUZhang Oct 2, 2025
45ad327
use peelingInfo.isXChr
AprilYUZhang Oct 2, 2025
d2242f4
modify annotation
AprilYUZhang Oct 2, 2025
5f1cb7b
edit phaseFounder
AprilYUZhang Oct 3, 2025
a2fec67
Add annotation
AprilYUZhang Oct 3, 2025
ffe237e
modify Dosage output
AprilYUZhang Oct 15, 2025
425e3b5
improve code efficiency
AprilYUZhang Oct 16, 2025
8c214ce
update not XChrMaleFlag
AprilYUZhang Oct 16, 2025
01f0a11
fix according to black and flack
AprilYUZhang Oct 31, 2025
0b55345
fix deprecated version of actions/download-artifact: v3
AprilYUZhang Oct 31, 2025
157b956
Update tinyhouse submodule to commit e8a7cd3
AprilYUZhang Oct 31, 2025
25d9497
fix the sex chromosome test
AprilYUZhang Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ jobs:
- name: Build SDist and wheel
run: python3 -m build

- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
with:
path: dist/*

- name: Check metadata
run: pipx run twine check dist/*

- uses: actions/download-artifact@v3
- uses: actions/download-artifact@v4
with:
name: artifact
path: dist
Expand Down
31 changes: 25 additions & 6 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,26 +122,27 @@ Peeling arguments
for each metafounder after each peeling cycle.
-no_phase_founder A flag phase a heterozygous allele in one of the
founders (if such an allele can be found).
-sex_chrom A flag to indicate that input data is for a sex chromosome. Sex needs to
be given in the pedigree file. This is currently an
experimental option.

-x_chr A flag to indicate that input data is for the X chromosome.

Genotype probability arguments:
-geno_error_prob GENO_ERROR_PROB
Genotyping error rate. [Default 0.0001]
-seq_error_prob SEQ_ERROR_PROB
Sequencing error rate. [Default 0.001]

-mutation_rate MUTATION_RATE
mutation rate. [Default 1e-8]
``-method`` controls whether the program is run in "single-locus" or "multi-locus" model. Single locus mode does not use linkage information to perform imputation. It is fast, but not very accurate. Multi-locus mode runs multi-locus iterative peeling which uses linkage information to increase accuracy and calculate segregation values.

For hybrid peeling, where a large amount (millions of segregating sites) of sequence allele read counts needs to be imputed, first run the program in multi-locus mode to generate a segregation file, and then run the program in single-locus mode with a known segregation file.

The ``-geno_error_prob``, ``-seq_error_prob`` and ``-rec_length`` arguments control some of the model parameters used in the model. ``-seq_error_prob`` must not be zero. |Software| is robust to deviations in genotyping error rate and sequencing error rate so it is not recommended to use these options unless large deviations from the default are known. Changing the ``-length`` argument to match the genetic map length can increase accuracy in some situations.
The ``-geno_error_prob``, ``-seq_error_prob``, ``-mutation_rate`` and ``-rec_length`` arguments control some of the model parameters used in the model. ``-seq_error_prob`` must not be zero. |Software| is robust to deviations in genotyping error rate and sequencing error rate so it is not recommended to use these options unless large deviations from the default are known. Changing the ``-length`` argument to match the genetic map length can increase accuracy in some situations.

The ``-est_geno_error_prob`` and ``-est_seq_error_prob`` options estimate the genotyping error rate and the sequencing error rate based on miss-match between observed and inferred states. This option is generally not necessary and can increase runtime. ``-est_alt_allele_prob`` estimates the alternative allele frequency for the base population before peeling using all available genotypes. This option can be useful if there are a large number of non-genotyped founders. ``-update_alt_allele_prob`` re-estimates the alternative allele frequencies per metafounder after each peeling cycle using the inferred genotype probabilities of the founders.

For a pedigree with multiple metafounders, the user has three options: (1) use ``-update_alt_allele_prob`` only, (2) use ``est_alt_allele_prob`` and ``-update_alt_allele_prob``, or (3) input estimates of the alternative allele frequencies for each metafounder via the ``-alt_allele_prob_file`` with or without ``-update_alt_allele_prob``.

When ``-x_chrom`` is used, there are two requirements for input files: (1) The pedigree file must include sex information in the fourth column (0 indicates male and 1 indicates female). See the Pedigree File Format section for an example. (2) Male's genotype must be coded as {0,1}. See the Genotype file Format section for an example. Note: This option currently only considers the regions of the X chromosome excluding the pseudo-autosomal regions (PARs).

Hybrid peeling arguments
------------------------

Expand Down Expand Up @@ -185,6 +186,16 @@ or
id3 id1 id2
id4 id1 id2

When working with X chromosome use this format - add sex information in the fourth column (0 indicates male and 1 indicates female):

::

id1 0 0 0
id2 0 0 1
id3 id1 id2 0
id4 id1 id2 1


Genotype file
=============

Expand All @@ -199,6 +210,14 @@ Example:
id3 2 0 2 0
id4 0 2 1 0

When working with X chromosome, male's genotype must be coded as {0,1}:
::

id1 0 1 9 0
id2 1 1 1 1
id3 1 0 1 0
id4 0 2 1 0

Sequence allele read counts file
================================

Expand Down
14 changes: 9 additions & 5 deletions src/tinypeel/Peeling/Peeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
locals={"e": float32, "e4": float32, "e16": float32, "e1e": float32},
)
def peel(family, operation, peelingInfo, singleLocusMode):
isSexChrom = peelingInfo.isSexChrom
isXChr = peelingInfo.isXChr

e = 0.000001
e1e = 1 - e
Expand Down Expand Up @@ -103,10 +103,12 @@ def peel(family, operation, peelingInfo, singleLocusMode):
# else:
# currentSeg[:,:] = segregation[child,:,:]

if isSexChrom and peelingInfo.sex[child] == 0: # 0 for male, 1 for female.
if isXChr and peelingInfo.sex[child] == 0: # 0 for male, 1 for female.
segregationTensor = peelingInfo.segregationTensorXY
if isSexChrom and peelingInfo.sex[child] == 1: # 0 for male, 1 for female.
segregationTensor_norm = peelingInfo.segregationTensorXY_norm
if isXChr and peelingInfo.sex[child] == 1: # 0 for male, 1 for female.
segregationTensor = peelingInfo.segregationTensorXX
segregationTensor_norm = peelingInfo.segregationTensorXX_norm

# Einstien sum notation 2: Create the child-specific segregation tensor using the child's currrent segregation estimate.
# childSegTensor[index,:,:,:,:] = np.einsum("abcd, di -> abci", segregationTensor, currentSeg)
Expand Down Expand Up @@ -204,10 +206,12 @@ def peel(family, operation, peelingInfo, singleLocusMode):
childValues = childValues / np.sum(childValues, axis=0)
childValues = e1e * childValues + e4

if isSexChrom and peelingInfo.sex[child] == 0: # 0 for male, 1 for female.
if isXChr and peelingInfo.sex[child] == 0: # 0 for male, 1 for female.
segregationTensor = peelingInfo.segregationTensorXY
if isSexChrom and peelingInfo.sex[child] == 1: # 0 for male, 1 for female.
segregationTensor_norm = peelingInfo.segregationTensorXY_norm
if isXChr and peelingInfo.sex[child] == 1: # 0 for male, 1 for female.
segregationTensor = peelingInfo.segregationTensorXX
segregationTensor_norm = peelingInfo.segregationTensorXX_norm

# Einstien sum notation 5:
# pointSeg[child,:,:] = np.einsum("abcd, abi, ci-> di", segregationTensor, parentsMinusChild[i,:,:,:], childValues)
Expand Down
53 changes: 23 additions & 30 deletions src/tinypeel/Peeling/PeelingIO.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def readInSeg(pedigree, fileName, start=None, stop=None):
)

if idx not in pedigree.individuals:
print(f"Individual {idx} not found in pedigree. Individual ignored.")
print(f"Individual {idx} is not found in pedigree. Individual ignored.")
else:
ind = pedigree.individuals[idx]
if e == 0:
Expand Down Expand Up @@ -101,10 +101,10 @@ def sort_key(mf_key):
)


def writeGenotypes(pedigree, genoProbFunc, isSexChrom):
def writeGenotypes(pedigree, genoProbFunc, isXChr):
args = InputOutput.args
if not args.no_dosage:
writeDosages(pedigree, genoProbFunc, isSexChrom, args.out_file + ".dosage.txt")
writeDosages(pedigree, genoProbFunc, isXChr, args.out_file + ".dosage.txt")
if args.phased_geno_prob:
writePhasedGenoProbs(
pedigree, genoProbFunc, args.out_file + ".phased_geno_prob.txt"
Expand All @@ -127,15 +127,15 @@ def writeGenotypes(pedigree, genoProbFunc, isSexChrom):
writeBinaryCalledGenotypes(
pedigree,
genoProbFunc,
isSexChrom,
isXChr,
args.out_file + ".called." + str(threshold),
threshold,
)
else:
writeCalledGenotypes(
pedigree,
genoProbFunc,
isSexChrom,
isXChr,
args.out_file + ".geno_" + str(threshold) + ".txt",
threshold,
)
Expand Down Expand Up @@ -198,31 +198,34 @@ def writeGenoProbs(pedigree, genoProbFunc, outputFile):
)


def writeDosages(pedigree, genoProbFunc, isSexChrom, outputFile):
def writeDosages(pedigree, genoProbFunc, isXChr, outputFile):
with open(outputFile, "w+") as f:
for idx, ind in pedigree.writeOrder():
matrix = np.dot(np.array([0, 1, 1, 2]), genoProbFunc(ind.idn))

if isSexChrom and ind.sex == 0:
matrix *= 2

if isXChr and ind.sex == 0:
tmp = np.array([0, 1, 0, 0])
else:
tmp = np.array([0, 1, 1, 2])
matrix = np.dot(tmp, genoProbFunc(ind.idn))
f.write(ind.idx + " " + " ".join(map("{:.4f}".format, matrix)) + "\n")


def writeCalledGenotypes(pedigree, genoProbFunc, isSexChrom, outputFile, thresh):
def writeCalledGenotypes(pedigree, genoProbFunc, isXChr, outputFile, thresh):
with open(outputFile, "w+") as f:
for idx, ind in pedigree.writeOrder():
matrix = genoProbFunc(ind.idn)
if isXChr and ind.sex == 0:
matrixCollapsedHets = np.array(
[matrix[0, :] + matrix[2, :], matrix[1, :] + matrix[3, :]],
dtype=np.float32,
)
else:
matrixCollapsedHets = np.array(
[matrix[0, :], matrix[1, :] + matrix[2, :], matrix[3, :]],
dtype=np.float32,
)

matrixCollapsedHets = np.array(
[matrix[0, :], matrix[1, :] + matrix[2, :], matrix[3, :]],
dtype=np.float32,
)
calledGenotypes = np.argmax(matrixCollapsedHets, axis=0)
setMissing(calledGenotypes, matrixCollapsedHets, thresh)
if isSexChrom and ind.sex == 0:
doubleIfNotMissing(calledGenotypes)

f.write(ind.idx + " " + " ".join(map(str, calledGenotypes)) + "\n")


Expand Down Expand Up @@ -250,29 +253,19 @@ def writeCalledPhase(pedigree, genoProbFunc, outputFile, thresh):
f.write(ind.idx + " " + " ".join(map(str, maternal_haplotype)) + "\n")


def writeBinaryCalledGenotypes(pedigree, genoProbFunc, isSexChrom, outputFile, thresh):
def writeBinaryCalledGenotypes(pedigree, genoProbFunc, isXChr, outputFile, thresh):
for idx, ind in pedigree.writeOrder():
matrix = genoProbFunc(ind.idn)
matrixCollapsedHets = np.array(
[matrix[0, :], matrix[1, :] + matrix[2, :], matrix[3, :]], dtype=np.float32
)
calledGenotypes = np.argmax(matrixCollapsedHets, axis=0)
setMissing(calledGenotypes, matrixCollapsedHets, thresh)
if isSexChrom and ind.sex == 0:
doubleIfNotMissing(calledGenotypes)
ind.genotypes = calledGenotypes.astype(np.int8)

InputOutput.writeOutGenotypesPlink(pedigree, outputFile)


@jit(nopython=True)
def doubleIfNotMissing(calledGenotypes):
nLoci = len(calledGenotypes)
for i in range(nLoci):
if calledGenotypes[i] == 1:
calledGenotypes[i] = 2


@jit(nopython=True)
def setMissing(calledGenotypes, matrix, thresh):
nLoci = len(calledGenotypes)
Expand Down
55 changes: 37 additions & 18 deletions src/tinypeel/Peeling/PeelingInfo.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,29 @@ def createPeelingInfo(pedigree, args, createSeg=True, phaseFounder=False):
peelingInfo = jit_peelingInformation(
nInd=pedigree.maxIdn, nFam=pedigree.maxFam, nLoci=nLoci, createSeg=createSeg
)
peelingInfo.isSexChrom = args.sex_chrom
peelingInfo.isXChr = args.x_chr
# Information about the peeling positions are handled elsewhere.
peelingInfo.positions = None

mutation_rate = args.mutation_rate
# Generate the segregation tensors.
peelingInfo.segregationTensor = ProbMath.generateSegregation(e=1e-06)
peelingInfo.segregationTensor = ProbMath.generateSegregation(mu=mutation_rate)
peelingInfo.segregationTensor_norm = ProbMath.generateSegregation(
e=1e-06, partial=True
mu=mutation_rate, partial=True
) # Partial gives the normalizing constant.

peelingInfo.segregationTensorXY = ProbMath.generateSegregationXYChrom(e=1e-06)
peelingInfo.segregationTensorXX = ProbMath.generateSegregationXXChrom(e=1e-06)
if peelingInfo.isXChr:
peelingInfo.segregationTensorXY = ProbMath.generateSegregationXYChrom(
mu=mutation_rate
)
peelingInfo.segregationTensorXY_norm = ProbMath.generateSegregationXYChrom(
mu=mutation_rate, partial=True
)
peelingInfo.segregationTensorXX = ProbMath.generateSegregationXXChrom(
mu=mutation_rate
)
peelingInfo.segregationTensorXX_norm = ProbMath.generateSegregationXXChrom(
mu=mutation_rate, partial=True
)

peelingInfo.genoError[:] = args.geno_error_prob
peelingInfo.seqError[:] = args.seq_error_prob
Expand All @@ -46,17 +57,17 @@ def createPeelingInfo(pedigree, args, createSeg=True, phaseFounder=False):
if ind.genotypes is not None and ind.haplotypes is not None:
HaplotypeOperations.ind_fillInGenotypesFromPhase(ind)

sexChromFlag = (
peelingInfo.isSexChrom and ind.sex == 0
) # This is the sex chromosome and the individual is male.
XChrMaleFlag = (
peelingInfo.isXChr and ind.sex == 0
) # This is the X chromosome and the individual is male.

peelingInfo.penetrance[ind.idn, :, :] = ProbMath.getGenotypeProbabilities(
peelingInfo.nLoci,
ind.genotypes,
ind.reads,
peelingInfo.genoError,
peelingInfo.seqError,
sexChromFlag,
XChrMaleFlag,
)

# Set the genotyping/read status for each individual. This will be used for, e.g., estimating the minor allele frequency.
Expand All @@ -69,15 +80,16 @@ def createPeelingInfo(pedigree, args, createSeg=True, phaseFounder=False):
if ind.isGenotypedFounder() and phaseFounder and ind.genotypes is not None:
loci = getHetMidpoint(ind.genotypes)
if loci is not None:
e = args.geno_error_prob
peelingInfo.penetrance[ind.idn, :, loci] = np.array(
[e / 3, e / 3, 1 - e, e / 3], dtype=np.float32
)
error = args.geno_error_prob
if (not peelingInfo.isXChr) or (ind.sex != 0): # sex = 0 is male
peelingInfo.penetrance[ind.idn, :, loci] = np.array(
[error / 3, error / 3, 1 - error, error / 3], dtype=np.float32
)

if args.penetrance is not None:
if args.sex_chrom:
if peelingInfo.isXChr:
print(
"Using an external penetrance file and the sex_chrom option is highly discouraged. Please do not use."
"Using an external penetrance file and the x_chr option is highly discouraged. Please do not use."
)

if args.est_geno_error_prob:
Expand Down Expand Up @@ -180,7 +192,7 @@ def getHetMidpoint(geno):
spec["nFam"] = int64
spec["nLoci"] = int64

spec["isSexChrom"] = boolean
spec["isXChr"] = boolean
spec["sex"] = int64[:]
spec["genotyped"] = boolean[:, :] # Maybe this should be removed?

Expand All @@ -206,6 +218,8 @@ def getHetMidpoint(geno):
) # Note: This one is a bit smaller.
spec["segregationTensorXX"] = optional(float32[:, :, :, :])
spec["segregationTensorXY"] = optional(float32[:, :, :, :])
spec["segregationTensorXX_norm"] = optional(float32[:, :, :])
spec["segregationTensorXY_norm"] = optional(float32[:, :, :])

# Marker specific rates:
spec["genoError"] = optional(float32[:])
Expand All @@ -225,7 +239,7 @@ def __init__(self, nInd, nFam, nLoci, createSeg=True):
self.nFam = nFam
self.nLoci = nLoci

self.isSexChrom = False
self.isXChr = False

self.construct(createSeg)

Expand All @@ -234,6 +248,11 @@ def __init__(self, nInd, nFam, nLoci, createSeg=True):
self.segregationTensor = None
self.segregationTensor_norm = None

self.segregationTensorXY = None
self.segregationTensorXY_norm = None
self.segregationTensorXX = None
self.segregationTensorXX_norm = None

def construct(self, createSeg=True):
baseValue = 0.25
self.sex = np.full(self.nInd, 0, dtype=np.int64)
Expand Down
Loading