4. Phylogenetics Flashcards

Question

Model Free Linkage Analysis | Description

Answer 1

- does not depend on prior specification of a model of inheritance for the disease of interest - genotype frequencies and penetrance need not be known in advance - several methods: - -affected sib pair test (ASP) - -non-parametric linkage (NPL) score - concepts of allele sharing are needed

Answer 2

- IBS and IBD are concepts of allele sharing between individuals - allele sharing is comparing the DNA sequence or allele at the same locus between two individuals

Answer 3

-alleles are IBS if they have the same form (i.e. having the same DNA sequence) independent of ancestral origin

Answer 4

- alleles are IBD if they have the same form AND have the same ancestral origin i.e. the same chromosomal region has been inherited in both individuals from a common ancestor - alleles that are IBD must be IBS

Answer 5

- denoted, ф - defined as the probability that a randomly drawn allele at any locus of an individual is IBD with a randomly drawn allele at the same locus from another individual

Answer 6

- there is a simple linear relationship between kinship coefficient and the pattern of IBD sharing 1) two alleles IBD, given any allele picked at the locus from an individual we have a probability of 1/2 of sampling the IBD allele from the other indiviudal 2) one-allele IBD the same probability is 1/4 3) zero-allele IBD, the same probability is 0

Answer 7

ф = 1/2P{IBD=2} + 1/4P{IBD=1} + 0P{IBD=0} = 1/2 E[IBD] -where E[IBD] is the expected proportion of alleles shared IBD at the locus for the two individuals concerned

Answer 8

E[IBD] = π - where π is the coefficient of relationship - and π is twice the kinship coefficient in the case of no inbreeding

Answer 9

-compares the observed number of independent affected sibling pairs sharing zero (no), one (n1) or two (n2) alleles at a given marker locus to the expected under no linkage -with hypotheses Ho : (po,p1,p2) = (1/4,1/2,1/4) H1 : (po,p1,p2) ≠ (1/4,1/2,1/4) -ASP tests can be broady classified into score tests (chi-square, proportion, mean) and the likelihood ratio test

Answer 10

``` Ho : (po,p1,p2) = (1/4,1/2,1/4) H1 : (po,p1,p2) ≠ (1/4,1/2,1/4) -test statistic: T = Σ [ni-ei]²/ei -where the sum is from i=0 to i=2 -under Ho T follows a chi-square distribution with 2DOF ```

Answer 11

-testing whether the proportion of ASPs sharing two alleles IBD (p2) is 1/4 under Ho Ho : p2 = 1/4 H1 : p2 ≠ 1/4 -test statistic Tprop = [n2-n/4]² / [n/4] -under Ho T follows a chi-square distribution with 1DoF

Answer 12

-compares whether the mean of ASPs sharing one (times 1/2) and two alleles IBS is equal to 1/2 Ho : (p1/2 + p2) = 1/2 H1 : (p1/2 + p2) ≠ 1/2 -test statistic z = [(n1/2 + n2)-n/2] / √[n/8] -under Ho z follows a standard normal distribution

Answer 13

-NOT AN ASP TEST -more accurate -the number of sib pairs who share zero, one and two alleles IBD follow a multinomial distribution with parameters po, p1, p2 respectively -test statistic X = 2 Σ ni log(ni/ei) -where the sum if from i=0 to i=2 -and X follows a chi-square distribution with 2DoF

Answer 14

- analysis of allele-sharing may be extended to other types of relative pairs and to larger sets of relatives by counting all possible inheritance patterns - Spairs counts the number of alleles IBD shared for each affected relative pair (ARP) - this is summed over all pairs of affected relatives

Answer 15

-let xi denote the number of alleles shared IBD by the ith sib-pair -want to create test with standardised xi: zi = [xi - E(xi)] / √[Var(xi)] = √2 (xi-1) -given a collection of pedigrees, the total NPL score for n affected sib pairs is Z = 1/√n Σzi -where the sum is from i=1 to i=n -Z follows a standard normal distribution

Answer 16

-denote P(xi=0)=po, P(xi=1)=p1, P(xi=2)=p2 -the expected number of alleles shared IBD: E[xi] = p1 + 2p2 E[xi²] = p1 + 4p2 -variance Var(xi) = E(xi²) - E(xi)² -under Ho: p1=1/2, p2=1/4 => E(xi) = 1 Var(xi) = 1/2

Answer 17

- different organisms often contain similar DNA sequences - in the theory of evolution this may be because a common ancestor experienced evolutionary mutational processes of substitution, insertion or deletion

Answer 18

- any set of species is related and this relationship is called phylogeny - this is usually described in a phylogenetic tree

Answer 19

- rooted trees | - unrooted trees

Answer 20

- all trees are assumed to be binary - a node is an endpoint of an edge - the 'root' is the ultimate ancestor - a labelled branching pattern is referred to as a topology - the length of the ith edge is denoted ti

Answer 21

- as we move up the tree, the edges coalesce and the number of edges is reduced to one - this gives a total of 2n-1 nodes, n terminal nodes and n-1 internal nodes - and therefore 2n-2 edges (discounting the edge above the root node)

Answer 22

- a phylogenetic tree is constructed from a multiple alignment of DNA sequences - a non-parametric construction of phylogenetic tree depends on pairwise distance between species

Answer 23

1) select species (DNA sequences) 2) multiple alignment of DNA sequences, assuming fixed length and no gaps - compute pairwise distances 3) infer phylogenetic tree

Answer 24

- parametric and non-parametric - the non-parametric methods we will be focusing on are distance matrix methods - in particular the neighbour-joining method and clustering method

Answer 25

-the pairwise distance between sequence x^i and x^j, denoted dij, is defined as the number of DNA bases that differ between the two distance, the Hamming distance

Answer 26

-distance methods reconstruct trees (rooted or unrooted) from a set of pairwise distances between the sequences in alignment (assumed given)

Answer 27

- let M be a set and let d: MxM -> ℝ be a function - we say that d is a distance function on M if: 1) d(u,v)>0 for all u, v ∈ M 2) d(u,u)=0 for all u ∈ M 3) d(u,v)=d(v,u) for all u, v ∈ M 4) the triangle inequality holds: d(u,v) ≤ d(u,w) + d(w,v) for all u,v,w ∈ M

Answer 28

-if we fix an unrooted tree T relating to the sequences (OTUs) we obtain a tree generated distance function d^T on M by declaring: d^T(x^i,x^j) = dij^T -to be the shortest path from x^i to x^j in T

Answer 29

-the answer is obviously yes for N=2, since there is only one possible path between each node anyway

Answer 30

-looking for positive numbers x, y, z such that: x + y = d12 x + z = d13 y + z = d23 -there is a unique tree that generates a given distance function -this uniqueness is a general fact for additive distance functions

Answer 31

-not every distance function on M is additive, it can be characterised in the following way, theorem: 'let d be a distance function M and N≥4 then d is additive if and only if the following condition holds: for every set of four distinct numbers 1≤i,j,k,l≤N, two of the sums dij+dkl, dik+djl, dil+djk coincide and are greater than or equal to the third one' -this condition is called the four point condition

Answer 32

- an iterative algorithm that on every step replaces a pair of OTUs with a single OTU and iterates until there are only three OTUs left - this means that for N=3, there is just one unrooted tree topology

Answer 33

-for every i=1,...,N define: ri = 1/[N-2] Σdik -where the sum is from k=1 to N

Answer 34

-for all i,j=1,...,N and i

Answer 35

-calculate the matrix D=(Dij) -pick a pair with 1≤i, j≤N for which Dij is minimal, such a pair may not be unique -group x^i and x^j and replace them with x^(N+1) which represents an internal node of the future tree connected to x^i and x^j and is placed at: d(N+1)i = 1/2 (dij + ri - rj) d(N+1)j = 1/2 (dij + rj - ri) -we define the distances between x^(N+1) and any x^m with m≠i,j as: d(N+1)m = 1/2 (dim + djm - dij) -we now have a collection of N-1 OTUs: M' = {x^m, x^(N+1), m≠i,j} -repeat the above procedure again until only three OTUs are left in which case there is just one unrooted tree topology

Answer 36

1) assign each (initial) node x^i to C^i, i.e. each node is assumed to be a cluster on its own 2) choose two clusters C^i and C^j for which d(C^i,C^j) is minimal (excluding i=j) 3) define a new cluster C^(N+1)=C^i ∪ C^j and set the distance to the remaining clusters with the distance between clusters equation 4) introduce a new internal node x^(N+1) (associated with cluster C^(N+1)) and place it at the total height d(C^i,C^j)/2 and redefine the new distance matrix 5) repeat the process until we have only one cluster and the node represents the root

Answer 37

4x4 matrix with entries pij=pij(t) with i,j∈{A,C,G,T} -assume a Markov model where, if at to the site was in state i∈{A,C,G,T} then the probability of the event that that at time to+t the site will be in state j∈{A,C,G,T} depends only on i, j and t

Answer 38

P(t) = exp(tQ) | -where Q=P'(0) is the 'rate matrix' or matrix of instantaneous change

Answer 39

-sets entries in Q to -3α/4 on diagonal and α/4 elsewhere for some positive constant α -then P(t) has elements rt on the diagonal and st elsewhere, where: rt = pii(t) = 1/4 + 3/4 exp(-αt), for all i st = pij(t) = 1/4 - 1/4 exp(-αt), for i≠j

Answer 40

-when t->∞, rt=st=1/4 which means that the nucleotide equilibrium frequencies in this model are: qA = qC = qG = qT = 1/4

Answer 41

P{x1u, x2u | T, t1, t2} = Σ qa P{x1u | a,t1} P{x2u | a,t2} | -where the sum is over a∈{A,C,G,T}

Answer 42

-if there are N positions (if length of sequence is N): L(t1, t2 | T, x1, x2) = P{x1, x2 | T, t1, t2} = ∏ P{x1u, x2u | T, t1, t2} -where the multiplication is over u=1 to u=N = 1/[16^(n1+n2)] {1+3exp[-α(t1+t2)]}^n1 {1-exp[-α(t1+t2)]}^n2 -where n1 is the number of positions where the nucleotides in the two sequences are identical and n2 is the number of locations where a substitution occurs