3. Statistical Genetics Flashcards

Question

Autosomal Dominant Disease Tests | Pearson Chi-Square Test

Answer 1

-to test null hypothesis Ho: p=1/2, can make reference to the chi-square distribution -the square of a standard normal variable is defined as χ1²: Z² = [X-np]² / np(1-p) -this can be composed to give a Pearson χ² statistic: Z² = Σ (O-E)²/E -for r of n offspring affected, can use statistic z² = [r* - n/2]² / [n/4] -p-value is computed as P_{χ1²} (Z²>z²) -reject Ho if: 1) p-value < 0.05 OR 2) z² > x_{95%}, assuming α=0.05 -where x_{95%} is 95 percentile of χ1² distribution

Answer 2

-likelihood is the probability density / distribution function but as a function of parameter rather than data -for a binomial, pdf: P(r;p) = nCr p^r [1-p]^(n-r) -the likelihood function given we observe r out of n offspring affected is given by: L(p) = nCr p^r [1-p]^(n-r)

Answer 3

-the likelihood ratio statistic is defined as: Δ = 2 log(L1/Lo) = 2log(L1) - 2log(Lo) -where log(L1) is the log likelihood given the observed data (evaluated at p^) and log(Lo) is that at the hypothetical value p=1/2 -under Ho, Δ is approximately χ1² distributed -p-value is computed as P_{χ1²}(Δ>𝛿) -reject Ho if: 1) p-value < 0.05 OR 2) 𝛿 > x_{95%}, assuming α=0.05 -where x_{95%} is 95 percentile of χ1² distribution

Answer 4

-in observing r out of n offspring affected, the log-likelihood function is: l(p) = rlog(p) + (n-r)log(1-p) -take first derivative of l(p) wrt p and set to zero for mle: p^ = r/n

Answer 5

-for estimation of SE(p^), the observed value of r can be regarded as a realisation of random variable X: X~Bin(n,p) -mean and variance of X are np and np(1-p) -so sampling mean and sampling variance are np/n=p and np(1-p)/n² = p(1-p)/n -so SE(p^) = √[p(1-p)/n] -since the true value of p is unknown, plug in mle, p=r/n

Answer 6

- binomial and standard normal test are not directly applicable as we have three categories - Pearson and Likelihood-ratio Chi-square tests can be extended and applied

Answer 7

-a generalisation of the binomial distribution -in binomial experiment we have two outcomes success and failure with probability p and 1-p -in multinomial experiment, have k outcomes with probabilities pi such that Σpi=1 -let random variables Xi indicate the number of times outcome i was observed over n trials -pdf: P(X;p) = n!/[x1!...xk!] * p1^(x1)p2^(xk)...pk^(xk) -likelihood: L(p;X) ∝ p1^(x1)p2^(x2)....pk^(xk)

Answer 8

- only individuals with DD have the disease - not possible to select DdxDd families on the basis of disease status of parents - problem of ascertainment

Answer 9

- usual procedure is to select initially a random sample of affected individuals in the population, probands - subsequently study their families for additional affected members, secondary cases - so DdxDd parents with no affected offspring will be missed

Answer 10

- define π as the probability that an affected individual in the population is identified as a proband, the ascertainment probability - assume π to be constant for all affected individuals - probability a family with r affected offspring is not ascertained is (1-π)^r - probability the family is ascertained is 1 - (1-π)^r

Answer 11

- when π=1, 1-(1-π)^r is 1 regardless of number of affected offspring so all families with affected offspring are ascertained -> complete ascertainment - when π->0, probability of ascertaining a family with r affected offspring becomes approximately 1-(1-rπ) - the probability of ascertainment is approximately proportional to the number of affected offspring - since π is very small, almost all ascertained families will have only one proband -> single ascertainment

Answer 12

- there are statistical procedures designed to deal with ascertainment in two conditions: 1) complete ascertainment 2) incomplete ascertainment - -proband method - -singles method

Answer 13

- all families with affected offspring are assumed to be identified - consider families of mating type DdxDd and s offspring - let X be a RV for the number of affected offspring in such a family, 0≤X≤s - then X would follow a binomial distribution with parameters s and p - for rare recessive disease, null hypothesis: p=1/4 - all families ascertained have at least one affected offspring, X>0 - the probability of observing r affected offspring in the family is conditional on the probability of X>0, given by a truncated binomial distribution

Answer 14

-the probability of observing r affected offspring in the family is conditional on the probability of X>0: P(X=r) = P(X=r, X>0) / P(X>0) -for 0≤r≤s: P(X=r,X>0) = 0 for r=0 and P(X=r) for 1≤r≤s -and P(X>0) = 1 - (1-p)^s -hence, for 1≤r≤s the probability function of X is: P(X=r) = [sCr p^r (1-p)^(s-r)] / [1 - (1-p)^s]

Answer 15

- two methods that arrive at the same estimate: 1) log likelihood 2) estimating equation

Answer 16

-denoting ar as the number of families with r affected offspring we have: L(p) = ∏ P(X=r)^ar l(p) = Σ ar log[P(X=r) -p^ is the value that maximises l(p), dl(p)/dp=0 -solved by numerical methods

Answer 17

-note that E(X) = [sp(1-p)] / [1-(1-p)^s] -the only unknown is the segregation ratio p -equate E(X) with mean number of affected offspring per family, r_ -solved by numerical methods

Answer 18

-not all families with affected offspring will be ascertained, π->0 -want to estimate segregation ratio p and π -devise a likelihood method because it does not make any assumption on complete ascertainment -in complete ascertainment we basically use: p^ = # affected offspring / #total offspring -for incomplete ascertainment we have to adjust these numbers p^ = R/S = adjusted #affected offspring / adjusted #total offspring -probability of ascertainment: π^ = B/R B = adjusted number of probands R = adjusted number of affected offspring -proband method and singles method are two methods of adjustment

Answer 19

-treat the siblings of probands as effective observations p^ = R/S R = total number of affected siblings S = total number of sibilings -so probands are excluded unless they themselves are siblings of probands -for a larger data set: --in a family with r out of s affected offspring we take r-1 and s-1 --if the family has b probands then each proband has r-1 and s-1 --suppose we have data from n families, segregation ratio is estimated by: p^ = [Σbi(ri-1)] / [Σbi(si-1)] -probability of ascertainment is estimated by: π^ = [Σbi(bi-1)] / [Σbi(ri-1)]

Answer 20

-takes as effective observations, ALL offspring except those who are the only proband in the family (singles) -let the number of singles in a sample of n families be d and r is the number of affected out of s total offspring -the segregation ratio is estimated as: p^ = [Σri-d] / [Σsi-d] -the ascertainment probability is estimated as: π^ = [Σbi-d] / [Σri-d]

Answer 21

- although the estimates of segregation ratio are simple in the proband and singles methods, their standard errors are complicated - but Var(p^) can be calculated for both cases

Answer 22

Z = [p^ - po] / SE(p^) - Ho: p=1/4 (disease is recessive) - standard normal testing

Answer 23

-for a locus with 3 genotypes; AA, Aa and aa frequencies are: f(AA) = #AA/N, f(Aa) = #Aa/N, f(aa) = #aa/N -sum of all genotype frequencies always =1

Answer 24

-the number of copies of a particular allele divide by total number of alleles in the sample e.g. for locus with three genotypes; AA, Aa, aa p = f(A) = [2nAA+nAa]/2N q = f(a) = [2naa +nAa]/2N -and f(A)+f(a)=1 always

Answer 25

-if a population is large, randomly mating and not affected by mutation, migration or natural selection then: i) the allelic frequencies of a population do not change AND ii) the genotype frequencies stabilise (will not change) after one generation in the proportions: f(AA) = p² f(Aa) = 2pq f(aa) = q² -where p=f(A) and q=f(a)

Answer 26

- two approaches 1) how the genotype frequencies stabilise 2) how the allele frequencies stabilise

Answer 27

-consider a biallelic locus with alleles A and a -three genotypes AA, Aa and aa with frequencies P, 2Q and R such that P+2Q+R=1 -Hardy's result: if individuals in the population mated at random, these frequencies would be Q² = PR -can show that this is true for the first generation of offspring: Q1²=P1R1 -the relative frequencies of the genotypes will remain unchanged after a second generation of random mating -it is in this sense that this ratio of frequencies represents an equilibrium

Answer 28

-denote f(A) = p and f(a) = q = 1-p -under random mating, the frequencies of genotypes of the offspring are: f(AA) = p², f(Aa) = 2pq, f(aa) = q² -regardless of the genotype frequencies in the parental generation

Answer 29

1) a population cannot evolve under HWE assumptions since evolution requires change in the allele frequencies of the population - reproduction alone will not bring about evolution other processes such as random mutation are required as well 2) when a population is in HWE, the genotype frequencies are determined by allele frequencies 3) a single generation of random mating produces the equilibrium frequencies p², 2pq, q² - -the fact that we observe the genotypes in HW proportions does not prove that the population is free from natural selection, mutation and migration - -it means only that these forces have not acted since the last random mating took place

Answer 30

-will consider two test; Chi-square test and likelihood ratio test -hypotheses: Ho : locus is in HWE H1 : locus is NOT in HWE -under Ho, both test statistics follow χ²df where df is the number of degrees of freedom, the number of expected genotypes - number of alleles

Answer 31

``` -test statistic: Z² = Σ (O-E)²/E -where E is the expected under Ho (HWE) -pvalue: p = P_{χ²df}(Z²>z²) ```

Answer 32

``` -test statistic: Δ = 2log(L1/Lo) = 2[logL1 - logL2] -where logL1 is the log likelihood giventhe observed data and logLo is that at the hypothetical value (expected under Ho) -pvalue: p = P_{χ²df}(Δ>𝛿) ```

Answer 33

-phenotype of individuals with heterozygous and homozygous normal alleles cannot be distinguished -use MLE estimator: p^ = (n1/[n1+n2])^(1/2) -where n1 and n2 are the numbers of affected and unaffected individuals respectively

Answer 34

- one or more assumptions are violated - genotyping errors - mixture of subpopulations

3. Statistical Genetics Flashcards

(58 cards)