3. Statistical Genetics Flashcards

1
Q

Locus

Definition

A
  • location in the genome usually of genetic variation of interest
  • this can be a single base or a whole gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can the risk of developing a disease be as a result of?

A
  • a major gene

- polygenes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Major Gene

Definition

A
  • a single locus that increases the risk of developing a disease
  • such diseases are called Mendelian diseases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Polygenes

Definition

A
  • the cumulative effect of a large number of genetic loci each having a small effect that, when taken together, increase the risk of a disease
  • such diseases are called complex diseases
  • the majority of diseases are complex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Penetrance Function

Definition

A
  • the set of probability distribution functions for the phenotype given the genotype
  • denoted Pr(Y|G) or P(Y|G) where Y is the phenotype and G is the genotype
  • assuming a binary trait, Y=1 indicates having the disease and Y=0 indicates unaffected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Phenocopy

Definition

A
  • an individual whose disease is due to environmental rather than genetic factors
  • usually we assume no phenocopies i.e. all risk of disease comes only form genetic factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fully Pentrant

Definition

A
  • if Pr(Y|DD)=1 or Pr(Y|dD)=1 in the case of a dominant allele, we say that the genotype is fully penetrant
  • it is sufficient to cause the disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dominant

Definition

A

Pr(Y|dD) = Pr(Y|DD)

  • a single copy of the mutant allele is sufficient to produce an increase in risk
  • allele D is dominant over allele d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Recessive

Definition

A

Pr(Y|dD) = Pr(Y|dd)

  • two copies of the mutant allele are necessary to produce an increase in risk
  • equivalently, one copy of the normal allele is sufficient to provide protection
  • allele D is recessive to d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Codominant

Definition

A
  • Pr(Y|dd) ≠ Pr(YdD) ≠ Pr(Y|DD)
  • all three genotypes have different effects on disease risk
  • in most cases, the heterozygotes have an effect that is intermediate between that of the two homozygotes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Additive / Dose-Dependent

Definition

A
  • a special case of co-dominance where Pr(Y|dD) is midway between Pr(Y|dd) and Pr(Y|DD)
  • i.e. the effect of DD increasing the risk of disease is twice as great as the effect of dD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mendelian Inheritance

Mendels Law

A
  • the description of inheritance of genes can be sumarised in two principles:
    1) segregation of alleles
    2) independent assortment
  • as well as two concepts, independent expression and random mating
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mendelian Inheritance

First Principle

A
  • each individual carries two copies of each gene, one inherited from each parent
  • alleles at any given gene are transmitted randomly and with equal probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mendelian Inheritance

Second Principle

A
  • alleles of different genes are transmitted independently

- we now know that this does not apply when loci are located near each other on the same chromosome (linkage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mendelian Inheritance

Third Concept

A
  • the expression of genes is independent of which parent they came from
  • heterozygotes have the same penetrance wheher the D allele came from the father and the d allele from the mother or vice versa
  • this is generally accepted although there are some exceptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mendelian Inheritance

Fourth Concept

A
  • random mating

- the probability of two individuals mating is independent of their genotypes at the locus of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Transmission Probabilities

A

-the probability distribution for gametes transmitted from a single parent to their offspring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Autosomal Dominant Inheritance

A

Pr(Y|dd) = 0
Pr(Y|dD) = Pr(Y|DD) = 1
-most dominant alleles are rare
-probability of a homozygous carrier, DD, is extremely rare and can be ignored
-this means that a case must be Dd
-since the disease is rare it is unlikely for two carriers to mate so the only matings of interest are dD x dd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Autosomal Recessive Inheritance

A
  • assume that only the DD genotype is affected, full penetrance and no phenocopies
  • an infected individual must have inherited one defective copy from each parent
  • most recessive traits are rare so parents are unlikely to be homozygotes, likely to be Dd x Dd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

X-Linked Inheritance

A
  • males are XY, females are XX
  • in females only one of the two X chromosomes is expressed, the other is inactivated early in development
  • the mother always transmits an X chromosome, the father can transmit either an X or a Y thus determining the sex of the offspring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Segregation Analysis

Definition

A
  • the process of fitting genetic models (dominant, recessive, codominant) to data on phenotypes
  • aims to test hypotheses about whether one or major genes and/or polygenes can account for the observed pattern of aggregation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Segregation Analysis for Autosomal Dominant Diseases

A
  • since the disease is rare, most matings between an affected and an unaffected person will be Ddxdd
  • the offspring of this mating has a 1/2 chance of being affected
  • can test whether segregation ratio in observed data is 1/2
  • several tests: binomial test, standard normal test, Pearson chi-square test and likelihood-ratio chi-square test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Autosomal Dominant Disease Tests

Binomial Test

A

-suppose a random sample of matings between affected and unaffected individuals is obtained and that out of the total n offspring, r are affected
-null hypothesis - the segregation ratio is 1/2
-regard each offspring as a trial and each affected as a success, then Ho:p=1/2 and
P(X=x) = nCx p^x [1-p]^(n-x)
-for a two-tailed test, the p-value associated with r affected individuals out of n offspring is given by a sum of binomial probabilities
p = (1/2)^(n-1) Σ nCx
-where the sum is from x=0 to x=c
-where c=r if r≤n/2 and c=n-r if r>n/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Autosomal Dominant Disease Tests

Standard Normal Test

A

-a good approximation to the binomial test (for np>5 and n large enough)
-if X~Bi(n,p), the statistic
Z = [X - np] / √[np(1-p)]
-approximately follows a standard normal distribution
-null hypothesis: p=1/2
-in testing Ho, we observe r of n offspring affected, the test statistic becomes:
z = [r* - n/2] / √[n/4]
-since for binomial, μ=np and σ²=n(1/2)(1/2)
-and r* is r corrected for continuity
-for two-sided test, p-value is twice P(Z>|z|)
-reject Ho is:
1) p-value < 0.05
OR
2) |z| > c_{97.5%}, assuming α=0.05
-where c_{97.5%} is the 97.5 percentile of the standard normal distribution

25
Autosomal Dominant Disease Tests | Pearson Chi-Square Test
-to test null hypothesis Ho: p=1/2, can make reference to the chi-square distribution -the square of a standard normal variable is defined as χ1²: Z² = [X-np]² / np(1-p) -this can be composed to give a Pearson χ² statistic: Z² = Σ (O-E)²/E -for r of n offspring affected, can use statistic z² = [r* - n/2]² / [n/4] -p-value is computed as P_{χ1²} (Z²>z²) -reject Ho if: 1) p-value < 0.05 OR 2) z² > x_{95%}, assuming α=0.05 -where x_{95%} is 95 percentile of χ1² distribution
26
Likelihood | Definition
-likelihood is the probability density / distribution function but as a function of parameter rather than data -for a binomial, pdf: P(r;p) = nCr p^r [1-p]^(n-r) -the likelihood function given we observe r out of n offspring affected is given by: L(p) = nCr p^r [1-p]^(n-r)
27
Autosomal Dominant Disease Tests | Likelihood-Ratio Test
-the likelihood ratio statistic is defined as: Δ = 2 log(L1/Lo) = 2log(L1) - 2log(Lo) -where log(L1) is the log likelihood given the observed data (evaluated at p^) and log(Lo) is that at the hypothetical value p=1/2 -under Ho, Δ is approximately χ1² distributed -p-value is computed as P_{χ1²}(Δ>𝛿) -reject Ho if: 1) p-value < 0.05 OR 2) 𝛿 > x_{95%}, assuming α=0.05 -where x_{95%} is 95 percentile of χ1² distribution
28
Likelihood | Maximum Likelihood Estimator
-in observing r out of n offspring affected, the log-likelihood function is: l(p) = rlog(p) + (n-r)log(1-p) -take first derivative of l(p) wrt p and set to zero for mle: p^ = r/n
29
Maximum Likelihood Estimator | Standard Error
-for estimation of SE(p^), the observed value of r can be regarded as a realisation of random variable X: X~Bin(n,p) -mean and variance of X are np and np(1-p) -so sampling mean and sampling variance are np/n=p and np(1-p)/n² = p(1-p)/n -so SE(p^) = √[p(1-p)/n] -since the true value of p is unknown, plug in mle, p=r/n
30
Segregation Analysis for Co-dominant Lovi
- binomial and standard normal test are not directly applicable as we have three categories - Pearson and Likelihood-ratio Chi-square tests can be extended and applied
31
Multinomial Distribution
-a generalisation of the binomial distribution -in binomial experiment we have two outcomes success and failure with probability p and 1-p -in multinomial experiment, have k outcomes with probabilities pi such that Σpi=1 -let random variables Xi indicate the number of times outcome i was observed over n trials -pdf: P(X;p) = n!/[x1!...xk!] * p1^(x1)p2^(xk)...pk^(xk) -likelihood: L(p;X) ∝ p1^(x1)p2^(x2)....pk^(xk)
32
Segregation Analysis for Autosomal Recessive Disorders
- only individuals with DD have the disease - not possible to select DdxDd families on the basis of disease status of parents - problem of ascertainment
33
Problem of Ascertainment
- usual procedure is to select initially a random sample of affected individuals in the population, probands - subsequently study their families for additional affected members, secondary cases - so DdxDd parents with no affected offspring will be missed
34
Ascertaiment Probability
- define π as the probability that an affected individual in the population is identified as a proband, the ascertainment probability - assume π to be constant for all affected individuals - probability a family with r affected offspring is not ascertained is (1-π)^r - probability the family is ascertained is 1 - (1-π)^r
35
Complete Ascertainment and Single Ascertainment
- when π=1, 1-(1-π)^r is 1 regardless of number of affected offspring so all families with affected offspring are ascertained -> complete ascertainment - when π->0, probability of ascertaining a family with r affected offspring becomes approximately 1-(1-rπ) - the probability of ascertainment is approximately proportional to the number of affected offspring - since π is very small, almost all ascertained families will have only one proband -> single ascertainment
36
What are the statistical procedures designed to account for ascertainment?
- there are statistical procedures designed to deal with ascertainment in two conditions: 1) complete ascertainment 2) incomplete ascertainment - -proband method - -singles method
37
Dealing with Ascertainment | Complete Ascertainment Condition
- all families with affected offspring are assumed to be identified - consider families of mating type DdxDd and s offspring - let X be a RV for the number of affected offspring in such a family, 0≤X≤s - then X would follow a binomial distribution with parameters s and p - for rare recessive disease, null hypothesis: p=1/4 - all families ascertained have at least one affected offspring, X>0 - the probability of observing r affected offspring in the family is conditional on the probability of X>0, given by a truncated binomial distribution
38
Truncated Binomial Distribution
-the probability of observing r affected offspring in the family is conditional on the probability of X>0: P(X=r) = P(X=r, X>0) / P(X>0) -for 0≤r≤s: P(X=r,X>0) = 0 for r=0 and P(X=r) for 1≤r≤s -and P(X>0) = 1 - (1-p)^s -hence, for 1≤r≤s the probability function of X is: P(X=r) = [sCr p^r (1-p)^(s-r)] / [1 - (1-p)^s]
39
Complete Ascertaiment | Estimating Segregation Ratio
- two methods that arrive at the same estimate: 1) log likelihood 2) estimating equation
40
Complete Ascertaiment | Estimating Segregation Ratio with Log Likelihood
-denoting ar as the number of families with r affected offspring we have: L(p) = ∏ P(X=r)^ar l(p) = Σ ar log[P(X=r) -p^ is the value that maximises l(p), dl(p)/dp=0 -solved by numerical methods
41
Complete Ascertaiment | Estimating Segregation Ratio with Estimating Equation
-note that E(X) = [sp(1-p)] / [1-(1-p)^s] -the only unknown is the segregation ratio p -equate E(X) with mean number of affected offspring per family, r_ -solved by numerical methods
42
Incomplete Ascertainment
-not all families with affected offspring will be ascertained, π->0 -want to estimate segregation ratio p and π -devise a likelihood method because it does not make any assumption on complete ascertainment -in complete ascertainment we basically use: p^ = # affected offspring / #total offspring -for incomplete ascertainment we have to adjust these numbers p^ = R/S = adjusted #affected offspring / adjusted #total offspring -probability of ascertainment: π^ = B/R B = adjusted number of probands R = adjusted number of affected offspring -proband method and singles method are two methods of adjustment
43
Proband Method
-treat the siblings of probands as effective observations p^ = R/S R = total number of affected siblings S = total number of sibilings -so probands are excluded unless they themselves are siblings of probands -for a larger data set: --in a family with r out of s affected offspring we take r-1 and s-1 --if the family has b probands then each proband has r-1 and s-1 --suppose we have data from n families, segregation ratio is estimated by: p^ = [Σbi(ri-1)] / [Σbi(si-1)] -probability of ascertainment is estimated by: π^ = [Σbi(bi-1)] / [Σbi(ri-1)]
44
Singles Method
-takes as effective observations, ALL offspring except those who are the only proband in the family (singles) -let the number of singles in a sample of n families be d and r is the number of affected out of s total offspring -the segregation ratio is estimated as: p^ = [Σri-d] / [Σsi-d] -the ascertainment probability is estimated as: π^ = [Σbi-d] / [Σri-d]
45
Standard Errors for Proband and Singles Method
- although the estimates of segregation ratio are simple in the proband and singles methods, their standard errors are complicated - but Var(p^) can be calculated for both cases
46
Hypothesis Testing for Proband and Singles Method
Z = [p^ - po] / SE(p^) - Ho: p=1/4 (disease is recessive) - standard normal testing
47
Genotype Frequencies
-for a locus with 3 genotypes; AA, Aa and aa frequencies are: f(AA) = #AA/N, f(Aa) = #Aa/N, f(aa) = #aa/N -sum of all genotype frequencies always =1
48
Allele Frequencies
-the number of copies of a particular allele divide by total number of alleles in the sample e.g. for locus with three genotypes; AA, Aa, aa p = f(A) = [2nAA+nAa]/2N q = f(a) = [2naa +nAa]/2N -and f(A)+f(a)=1 always
49
Hardy-Weinberg Law
-if a population is large, randomly mating and not affected by mutation, migration or natural selection then: i) the allelic frequencies of a population do not change AND ii) the genotype frequencies stabilise (will not change) after one generation in the proportions: f(AA) = p² f(Aa) = 2pq f(aa) = q² -where p=f(A) and q=f(a)
50
How can you show that equilibrium has been reached?
- two approaches 1) how the genotype frequencies stabilise 2) how the allele frequencies stabilise
51
How Genotype Frequencies Stabilise
-consider a biallelic locus with alleles A and a -three genotypes AA, Aa and aa with frequencies P, 2Q and R such that P+2Q+R=1 -Hardy's result: if individuals in the population mated at random, these frequencies would be Q² = PR -can show that this is true for the first generation of offspring: Q1²=P1R1 -the relative frequencies of the genotypes will remain unchanged after a second generation of random mating -it is in this sense that this ratio of frequencies represents an equilibrium
52
How the Allele Frequencies Stabilise
-denote f(A) = p and f(a) = q = 1-p -under random mating, the frequencies of genotypes of the offspring are: f(AA) = p², f(Aa) = 2pq, f(aa) = q² -regardless of the genotype frequencies in the parental generation
53
Implications of the Hardy-Weinberg Law
1) a population cannot evolve under HWE assumptions since evolution requires change in the allele frequencies of the population - reproduction alone will not bring about evolution other processes such as random mutation are required as well 2) when a population is in HWE, the genotype frequencies are determined by allele frequencies 3) a single generation of random mating produces the equilibrium frequencies p², 2pq, q² - -the fact that we observe the genotypes in HW proportions does not prove that the population is free from natural selection, mutation and migration - -it means only that these forces have not acted since the last random mating took place
54
Testing for HW Proportion
-will consider two test; Chi-square test and likelihood ratio test -hypotheses: Ho : locus is in HWE H1 : locus is NOT in HWE -under Ho, both test statistics follow χ²df where df is the number of degrees of freedom, the number of expected genotypes - number of alleles
55
Testing for HW Proportion | Chi-square
``` -test statistic: Z² = Σ (O-E)²/E -where E is the expected under Ho (HWE) -pvalue: p = P_{χ²df}(Z²>z²) ```
56
Testing for HW Proportion | Likelihood Ratio
``` -test statistic: Δ = 2log(L1/Lo) = 2[logL1 - logL2] -where logL1 is the log likelihood giventhe observed data and logLo is that at the hypothetical value (expected under Ho) -pvalue: p = P_{χ²df}(Δ>𝛿) ```
57
Estimation of Allele Frequencies for Non-Codominant Locus
-phenotype of individuals with heterozygous and homozygous normal alleles cannot be distinguished -use MLE estimator: p^ = (n1/[n1+n2])^(1/2) -where n1 and n2 are the numbers of affected and unaffected individuals respectively
58
What does it mean if we find a locus not in HWE?
- one or more assumptions are violated - genotyping errors - mixture of subpopulations