4. Phylogenetics Flashcards

1
Q

Cross-Over

A
  • the idea of localising the disease gene is based on the event of ‘cross over’
  • during the meiosis stage there is an exchange of genetic material between homologous chromosomes
  • this results in exchange of genes, genetic recombination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Haplotypes Example

Outline

A
  • two loci: A and B
  • genotypes Af, Am, Bf, Bm where f indicates father and m indicates mother
  • formed by haplotypes of two gametes AfBf from father and AmBm from mother
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Haplotypes Example

Non-Recombinant

A

-after meiosis can have
AfBf and AmBm
-i.e. no recombination between the parental haplotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Haplotypes Example

Recombinant

A
  • during meiosis if crossing over occurs, can get recombination between parental haplotypes:
    e. g. AmBf and AfBm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Recombination Fraction

A
  • usually denoted θ

- the probability that a gamete is recombinant with respect to the locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Likelihood of Crossober

A
  • for loci in different chromosomes, independent segregation insures that R and NR gametes are equally likely to occur, θ=1/2
  • for loci in the same chromosome, separation of two paternal / maternal alleles requires the occurrence of crossover between the two loci
  • the closer the two loci the less likely this is, θ<1/2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linkage Definition

A
  • two loci with a recombination fraction less than 1/2 are said to be in linkage
  • the smaller the recombination fraction, the more tightly linked the two loci are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Morgans

A

-the genetic map distance between two loci is defined as the expected number of crossovers occurring between them on a single chromatid during meiosis, unit Morgans
1cM ~ 1 million bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Map Functions

A
  • a mathematical relationship that converts map distance (m) to recombination fraction θ is called a map function
  • the function connects two key quantities; genetic distance and recombination probabilities
  • the most famous map functions are Haldane and Kosambi
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Haldane Map Function

Description

A
  • assumes that crossovers occur at random, independently of each other
  • the occurrence between two loci is a Poisson process i.e. they are equally likely at any point between the loci and the number of crossovers between loci follows a Poisson distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The Haldane Map Function

Function

A

θ = [1-exp(-2m)]/2
-with inverse
m = -1/2 log(1-2θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Kosambi Map Function

Description

A

-a generalisation of the Haldane function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The Kosambi Map Function

Function

A

m = 1/4 log{[1+2θ]/[1-2θ]}
-with inverse
θ = 1/2 [exp(4m)-1]/[exp(4m)+1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Genetic Marker

Definition

A
  • genetic variants with known DNA sequence and known location
  • for the purpose of linkage analysis these markers need to be easily and reliably detectable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Microsatellites

A

-repeats of simple DNA sequences, e.g.:

…CACACACACAC…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SNPs

A

-single nucleotide polymorphisms, e.g.
…CTGGTAGCTA…
…CTGGCAGCTA…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Linakge Analysis

Description

A

-based on the event of crossover during meiosis
-need a known genetic marker to estimate the recombination fraction θ
-once we have θ^, can estimate the distance between the marker and location of interest
-if R and NR gametes in a random sample can be counted:
θ^ = #R / [#(R+NR)]
-a test for linkage simplifies further to testing:
Ho : θ = 1/2
vs
H1 : θ < 1/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Identifying R and NR Gametes

A
  • to identify which gametes are R and NR, we need to know their phases
  • phase is the situation where one of the alleles in the disease gene is in the same strand as one alleles of the marker
  • in lab conditions (with animals) this is easily achievable
  • in a human population, we need a three generation pedigree to be able to know the phase with certainty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

θ^ Estimate

A

θ^ = R/N when RN/2

-since θ>1/2 is inadmissable on biological grounds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

θ^ Estimate

R < N/2

A
-then:
θ^ = R/N
-with approximate standard error:
√[θ^(1-θ^)/N]
-to test Ho:θ=1/2 vs H1:θ<1/2 can use a chi square test, likelihood ratio lest and LOD score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

θ^ Estimate

Chi-Square Test

A

-under Ho, expected numbers of R and NR are both N/2
-test statistic:
T = [R-N/2]²/[N/2] + [N-R-N/2]²/[N/2]
= [N-2R]² /N
-a one-tailed test with 1DoF
-if R > N/2, T reassigned to 0 and conclude test is not significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

θ^ Estimate

Likelihood Ratio Test

A
L(θ) = θ^R [1-θ]^(N-R)
-can take log for log likelihood l(θ)
-likelihood ratio is:
Λ(θ) = L(θ) / L(θ=1/2)
-test statistic:
X = 2logΛ
= 2 [ l(θ) - l(θ=1/2)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

θ^ Estimate

LOD Score

A

Z(θ) = log_10_(Λ(θ))

-the conventional critical value for calling a test significant is Z≥3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Unknown Parental Haplotypes

A
-two possible phases for parent:
AB / ab
Ab / aB
-a priori these are equally likely with probability 1/2
-so likelihood is:
L(θ) = 0.5θ^4[1-θ]² + 0.5θ²[1-θ]^4
-can show MLE of θ is 1/2
-once θ^ is obtained, can use the same likelihood ratio test or LOD score as phase known pedigree
-but NOT chi-square test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Model Free Linkage Analysis | Description
- does not depend on prior specification of a model of inheritance for the disease of interest - genotype frequencies and penetrance need not be known in advance - several methods: - -affected sib pair test (ASP) - -non-parametric linkage (NPL) score - concepts of allele sharing are needed
26
Allele Sharing Between Individuals
- IBS and IBD are concepts of allele sharing between individuals - allele sharing is comparing the DNA sequence or allele at the same locus between two individuals
27
Identical by State (IBS)
-alleles are IBS if they have the same form (i.e. having the same DNA sequence) independent of ancestral origin
28
Identical by Descent (IBD)
- alleles are IBD if they have the same form AND have the same ancestral origin i.e. the same chromosomal region has been inherited in both individuals from a common ancestor - alleles that are IBD must be IBS
29
Kinship Coeffiecient | Description
- denoted, ф - defined as the probability that a randomly drawn allele at any locus of an individual is IBD with a randomly drawn allele at the same locus from another individual
30
Kinship Coefficient and IBD Sharing
- there is a simple linear relationship between kinship coefficient and the pattern of IBD sharing 1) two alleles IBD, given any allele picked at the locus from an individual we have a probability of 1/2 of sampling the IBD allele from the other indiviudal 2) one-allele IBD the same probability is 1/4 3) zero-allele IBD, the same probability is 0
31
Kinship Coefficient | Definition
ф = 1/2P{IBD=2} + 1/4P{IBD=1} + 0P{IBD=0} = 1/2 E[IBD] -where E[IBD] is the expected proportion of alleles shared IBD at the locus for the two individuals concerned
32
Coefficient of Relationship
E[IBD] = π - where π is the coefficient of relationship - and π is twice the kinship coefficient in the case of no inbreeding
33
Affected Sib Pair (ASP) Test | Description
-compares the observed number of independent affected sibling pairs sharing zero (no), one (n1) or two (n2) alleles at a given marker locus to the expected under no linkage -with hypotheses Ho : (po,p1,p2) = (1/4,1/2,1/4) H1 : (po,p1,p2) ≠ (1/4,1/2,1/4) -ASP tests can be broady classified into score tests (chi-square, proportion, mean) and the likelihood ratio test
34
Affected Sib Pair (ASP) Test | Chi-Square Test
``` Ho : (po,p1,p2) = (1/4,1/2,1/4) H1 : (po,p1,p2) ≠ (1/4,1/2,1/4) -test statistic: T = Σ [ni-ei]²/ei -where the sum is from i=0 to i=2 -under Ho T follows a chi-square distribution with 2DOF ```
35
Affected Sib Pair (ASP) Test | Proportion Test
-testing whether the proportion of ASPs sharing two alleles IBD (p2) is 1/4 under Ho Ho : p2 = 1/4 H1 : p2 ≠ 1/4 -test statistic Tprop = [n2-n/4]² / [n/4] -under Ho T follows a chi-square distribution with 1DoF
36
Affected Sib Pair (ASP) Test | Mean Test
-compares whether the mean of ASPs sharing one (times 1/2) and two alleles IBS is equal to 1/2 Ho : (p1/2 + p2) = 1/2 H1 : (p1/2 + p2) ≠ 1/2 -test statistic z = [(n1/2 + n2)-n/2] / √[n/8] -under Ho z follows a standard normal distribution
37
Affected Sib Pair (ASP) Test | Likelihood Ratio Test
-NOT AN ASP TEST -more accurate -the number of sib pairs who share zero, one and two alleles IBD follow a multinomial distribution with parameters po, p1, p2 respectively -test statistic X = 2 Σ ni log(ni/ei) -where the sum if from i=0 to i=2 -and X follows a chi-square distribution with 2DoF
38
``` Nonparametric Linkage (NPL) Score Description ```
- analysis of allele-sharing may be extended to other types of relative pairs and to larger sets of relatives by counting all possible inheritance patterns - Spairs counts the number of alleles IBD shared for each affected relative pair (ARP) - this is summed over all pairs of affected relatives
39
``` Nonparametric Linkage (NPL) Score Test ```
-let xi denote the number of alleles shared IBD by the ith sib-pair -want to create test with standardised xi: zi = [xi - E(xi)] / √[Var(xi)] = √2 (xi-1) -given a collection of pedigrees, the total NPL score for n affected sib pairs is Z = 1/√n Σzi -where the sum is from i=1 to i=n -Z follows a standard normal distribution
40
``` Nonparametric Linkage (NPL) Score Expectation and Variance ```
-denote P(xi=0)=po, P(xi=1)=p1, P(xi=2)=p2 -the expected number of alleles shared IBD: E[xi] = p1 + 2p2 E[xi²] = p1 + 4p2 -variance Var(xi) = E(xi²) - E(xi)² -under Ho: p1=1/2, p2=1/4 => E(xi) = 1 Var(xi) = 1/2
41
Introduction to Phylogenetics
- different organisms often contain similar DNA sequences - in the theory of evolution this may be because a common ancestor experienced evolutionary mutational processes of substitution, insertion or deletion
42
Phylogeny
- any set of species is related and this relationship is called phylogeny - this is usually described in a phylogenetic tree
43
What are the two types of tree?
- rooted trees | - unrooted trees
44
Notes on Phylogenetic Trees
- all trees are assumed to be binary - a node is an endpoint of an edge - the 'root' is the ultimate ancestor - a labelled branching pattern is referred to as a topology - the length of the ith edge is denoted ti
45
How many nodes and edges are there in a rooted tree of n leaves?
- as we move up the tree, the edges coalesce and the number of edges is reduced to one - this gives a total of 2n-1 nodes, n terminal nodes and n-1 internal nodes - and therefore 2n-2 edges (discounting the edge above the root node)
46
Pairwise Distance | Introduction
- a phylogenetic tree is constructed from a multiple alignment of DNA sequences - a non-parametric construction of phylogenetic tree depends on pairwise distance between species
47
Process of Constructing a Phylogenetic Tree
1) select species (DNA sequences) 2) multiple alignment of DNA sequences, assuming fixed length and no gaps - compute pairwise distances 3) infer phylogenetic tree
48
Tree Construction Methods
- parametric and non-parametric - the non-parametric methods we will be focusing on are distance matrix methods - in particular the neighbour-joining method and clustering method
49
Pairwise Distance | Definition
-the pairwise distance between sequence x^i and x^j, denoted dij, is defined as the number of DNA bases that differ between the two distance, the Hamming distance
50
Distance Methods
-distance methods reconstruct trees (rooted or unrooted) from a set of pairwise distances between the sequences in alignment (assumed given)
51
Distance Function | Definition
- let M be a set and let d: MxM -> ℝ be a function - we say that d is a distance function on M if: 1) d(u,v)>0 for all u, v ∈ M 2) d(u,u)=0 for all u ∈ M 3) d(u,v)=d(v,u) for all u, v ∈ M 4) the triangle inequality holds: d(u,v) ≤ d(u,w) + d(w,v) for all u,v,w ∈ M
52
Tree Generated Distance Function | Definition
-if we fix an unrooted tree T relating to the sequences (OTUs) we obtain a tree generated distance function d^T on M by declaring: d^T(x^i,x^j) = dij^T -to be the shortest path from x^i to x^j in T
53
Does there exist a tree T that generates d, which means that d^T = d (dij^T=dij) ? N=2
-the answer is obviously yes for N=2, since there is only one possible path between each node anyway
54
Does there exist a tree T that generates d, which means that d^T = d (dij^T=dij) ? N=3
-looking for positive numbers x, y, z such that: x + y = d12 x + z = d13 y + z = d23 -there is a unique tree that generates a given distance function -this uniqueness is a general fact for additive distance functions
55
Does there exist a tree T that generates d, which means that d^T = d (dij^T=dij) ? N≥4
-not every distance function on M is additive, it can be characterised in the following way, theorem: 'let d be a distance function M and N≥4 then d is additive if and only if the following condition holds: for every set of four distinct numbers 1≤i,j,k,l≤N, two of the sums dij+dkl, dik+djl, dil+djk coincide and are greater than or equal to the third one' -this condition is called the four point condition
56
Neighbour Joining Algorithm | Description
- an iterative algorithm that on every step replaces a pair of OTUs with a single OTU and iterates until there are only three OTUs left - this means that for N=3, there is just one unrooted tree topology
57
Neighbour Joining Algorithm | ri
-for every i=1,...,N define: ri = 1/[N-2] Σdik -where the sum is from k=1 to N
58
Neighbour Joining Algorithm | Dij
-for all i,j=1,...,N and i
59
Neighbour Joining Algorithm | Steps
-calculate the matrix D=(Dij) -pick a pair with 1≤i, j≤N for which Dij is minimal, such a pair may not be unique -group x^i and x^j and replace them with x^(N+1) which represents an internal node of the future tree connected to x^i and x^j and is placed at: d(N+1)i = 1/2 (dij + ri - rj) d(N+1)j = 1/2 (dij + rj - ri) -we define the distances between x^(N+1) and any x^m with m≠i,j as: d(N+1)m = 1/2 (dim + djm - dij) -we now have a collection of N-1 OTUs: M' = {x^m, x^(N+1), m≠i,j} -repeat the above procedure again until only three OTUs are left in which case there is just one unrooted tree topology
60
Clustering Method | Steps
1) assign each (initial) node x^i to C^i, i.e. each node is assumed to be a cluster on its own 2) choose two clusters C^i and C^j for which d(C^i,C^j) is minimal (excluding i=j) 3) define a new cluster C^(N+1)=C^i ∪ C^j and set the distance to the remaining clusters with the distance between clusters equation 4) introduce a new internal node x^(N+1) (associated with cluster C^(N+1)) and place it at the total height d(C^i,C^j)/2 and redefine the new distance matrix 5) repeat the process until we have only one cluster and the node represents the root
61
Matrix of Transition Probabilities
4x4 matrix with entries pij=pij(t) with i,j∈{A,C,G,T} -assume a Markov model where, if at to the site was in state i∈{A,C,G,T} then the probability of the event that that at time to+t the site will be in state j∈{A,C,G,T} depends only on i, j and t
62
Rate Matrix
P(t) = exp(tQ) | -where Q=P'(0) is the 'rate matrix' or matrix of instantaneous change
63
Juke-Cantor Model | Matrices
-sets entries in Q to -3α/4 on diagonal and α/4 elsewhere for some positive constant α -then P(t) has elements rt on the diagonal and st elsewhere, where: rt = pii(t) = 1/4 + 3/4 exp(-αt), for all i st = pij(t) = 1/4 - 1/4 exp(-αt), for i≠j
64
Juke-Cantor Model | Nucleotide Equilibrium Frequencies
-when t->∞, rt=st=1/4 which means that the nucleotide equilibrium frequencies in this model are: qA = qC = qG = qT = 1/4
65
Juke-Cantor Model | Probability
P{x1u, x2u | T, t1, t2} = Σ qa P{x1u | a,t1} P{x2u | a,t2} | -where the sum is over a∈{A,C,G,T}
66
Juke-Cantor Model | Likelihood
-if there are N positions (if length of sequence is N): L(t1, t2 | T, x1, x2) = P{x1, x2 | T, t1, t2} = ∏ P{x1u, x2u | T, t1, t2} -where the multiplication is over u=1 to u=N = 1/[16^(n1+n2)] {1+3exp[-α(t1+t2)]}^n1 {1-exp[-α(t1+t2)]}^n2 -where n1 is the number of positions where the nucleotides in the two sequences are identical and n2 is the number of locations where a substitution occurs