Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 44 additions & 27 deletions docs/ancestry.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ this.
: Coalescent with recombination ("hudson")

{class}`.SmcKApproxCoalescent`
: General Sequentially Markov Coalescent
: General Sequentially Markov Coalescent

{class}`.DiscreteTimeWrightFisher`
: Generation-by-generation Wright-Fisher
Expand Down Expand Up @@ -2230,7 +2230,7 @@ in units of 4N generations.

### SMC approximations

The **SMC** and **SMC′**
The **SMC** and **SMC′**
are approximations of the continuous time
{ref}`Hudson coalescent<sec_ancestry_models_hudson>` model. These were originally
motivated largely by the need to simulate coalescent processes more efficiently
Expand All @@ -2240,19 +2240,19 @@ mean that such approximations are now unnecessary for many simulations.

The **SMC** and **SMC'** are, however, very important for inference, as the approximations
have made many analytical advances possible. Moreover, using these approximations,
we are able to simulate regimes which we couldn't simulate otherwise: for example,
we are able to simulate regimes which we couldn't simulate otherwise: for example,
**Drosophila** and **Drosophila-like** simulations with very high scaled recombination rates.


The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
distance between two genomic segments that can share a common ancestor. Setting the
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are removing trailing white spaces from different lines. Is this intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained in another PR, this is because my editor is configured to automatically trim whitespace. The issue here is that your editor isn't set up to do this, and we're not enforcing lint checking on markdown files.

**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
distance between two genomic segments that can share a common ancestor. Setting the
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
hull_offset can take any value between **0** and the sequence length.

In this example, we use the {class}`SMC(k) <.SmcKApproxCoalescent>` model to run **SMC'**
Expand All @@ -2265,7 +2265,7 @@ SVG(ts.draw_svg(y_axis=True, time_scale="log_time"))
```
:::{Note}
Since the **SMC** models are approximations of the {ref}`Hudson coalescent<sec_ancestry_models_hudson>`,
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
regimes with moderate scaled recombination rates (including full human chromosome simulations),
we recommend using the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` whenever possible.
:::
Expand Down Expand Up @@ -2842,34 +2842,51 @@ combined with other {ref}`ancestry models<sec_ancestry_models>`.

#### Tracing ancestry through a pedigree

The previous example generated trees that are embedded within the pedigree,
but doesn't tell us directly the exact path that each sample's genetic
material took through the ancestors. To trace these paths exactly, we can
use a combination of the ``additional_nodes``,
The previous simulation generated trees that are embedded within the pedigree,
but it doesn't tell us directly the path that each sample's genetic
material took through its ancestors.

To trace these paths exactly, we can
use a combination of ``additional_nodes``,
``coalescing_segments_only``
(see {ref}`sec_ancestry_additional_nodes`),
and ``stop_at_local_mrca`` (see {ref}`sec_ancestry_stop_at_local_mrca`).
and stop_at_local_mrca (see {ref}`sec_ancestry_stop_at_local_mrca`).
For this example, we create another contrived pedigree with a single
proband, and 3 generations of diploid ancestors:

```{code-cell}
ped_txt = """\
# id parent0 parent1 time is_sample
0 1 2 0.0 1
1 3 4 1.0 0
2 3 4 1.0 0
3 5 6 2.0 0
4 5 6 2.0 0
5 . . 3.0 0
6 . . 3.0 0
"""
pedigree = msprime.parse_pedigree(io.StringIO(ped_txt), sequence_length=100)
draw_pedigree(pedigree.tree_sequence())
```
We can then run a simuation through this pedigree. For simplicity here,
we don't have any recombination:
```{code-cell}
ped_ts = msprime.sim_ancestry(
initial_state=pedigree,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you move the simulations block to the next code block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call - done.

model="fixed_pedigree",
random_seed=41,
recombination_rate=0.001,
random_seed=998,
coalescing_segments_only=False,
additional_nodes=(
msprime.NodeType.COMMON_ANCESTOR |
msprime.NodeType.PASS_THROUGH),
stop_at_local_mrca=True
stop_at_local_mrca=False
)
node_labels = {node.id: f"{node.individual}({node.id})" for node in ped_ts.nodes()}
SVG(ped_ts.draw_svg(y_axis=True, node_labels=node_labels, size=(600,200)))
```

We can now see that the ancestor of node 15 is 20, which we would have to
infer in the previous example.


Now we can see that the two genomes of individual 0 (nodes 0 and 1)
reach their common ancestor node 9 of individual 4. Because we
have set ``stop_at_local_mrca=False``, we can now also see how
this ancestral material goes further back in the pedigree.

#### Censoring pedigree information

Expand Down