3. Statistical Genetics Flashcards

1
Q

Locus

Definition

A
  • location in the genome usually of genetic variation of interest
  • this can be a single base or a whole gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can the risk of developing a disease be as a result of?

A
  • a major gene

- polygenes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Major Gene

Definition

A
  • a single locus that increases the risk of developing a disease
  • such diseases are called Mendelian diseases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Polygenes

Definition

A
  • the cumulative effect of a large number of genetic loci each having a small effect that, when taken together, increase the risk of a disease
  • such diseases are called complex diseases
  • the majority of diseases are complex
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Penetrance Function

Definition

A
  • the set of probability distribution functions for the phenotype given the genotype
  • denoted Pr(Y|G) or P(Y|G) where Y is the phenotype and G is the genotype
  • assuming a binary trait, Y=1 indicates having the disease and Y=0 indicates unaffected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Phenocopy

Definition

A
  • an individual whose disease is due to environmental rather than genetic factors
  • usually we assume no phenocopies i.e. all risk of disease comes only form genetic factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fully Pentrant

Definition

A
  • if Pr(Y|DD)=1 or Pr(Y|dD)=1 in the case of a dominant allele, we say that the genotype is fully penetrant
  • it is sufficient to cause the disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dominant

Definition

A

Pr(Y|dD) = Pr(Y|DD)

  • a single copy of the mutant allele is sufficient to produce an increase in risk
  • allele D is dominant over allele d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Recessive

Definition

A

Pr(Y|dD) = Pr(Y|dd)

  • two copies of the mutant allele are necessary to produce an increase in risk
  • equivalently, one copy of the normal allele is sufficient to provide protection
  • allele D is recessive to d
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Codominant

Definition

A
  • Pr(Y|dd) ≠ Pr(YdD) ≠ Pr(Y|DD)
  • all three genotypes have different effects on disease risk
  • in most cases, the heterozygotes have an effect that is intermediate between that of the two homozygotes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Additive / Dose-Dependent

Definition

A
  • a special case of co-dominance where Pr(Y|dD) is midway between Pr(Y|dd) and Pr(Y|DD)
  • i.e. the effect of DD increasing the risk of disease is twice as great as the effect of dD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mendelian Inheritance

Mendels Law

A
  • the description of inheritance of genes can be sumarised in two principles:
    1) segregation of alleles
    2) independent assortment
  • as well as two concepts, independent expression and random mating
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mendelian Inheritance

First Principle

A
  • each individual carries two copies of each gene, one inherited from each parent
  • alleles at any given gene are transmitted randomly and with equal probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mendelian Inheritance

Second Principle

A
  • alleles of different genes are transmitted independently

- we now know that this does not apply when loci are located near each other on the same chromosome (linkage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mendelian Inheritance

Third Concept

A
  • the expression of genes is independent of which parent they came from
  • heterozygotes have the same penetrance wheher the D allele came from the father and the d allele from the mother or vice versa
  • this is generally accepted although there are some exceptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mendelian Inheritance

Fourth Concept

A
  • random mating

- the probability of two individuals mating is independent of their genotypes at the locus of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Transmission Probabilities

A

-the probability distribution for gametes transmitted from a single parent to their offspring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Autosomal Dominant Inheritance

A

Pr(Y|dd) = 0
Pr(Y|dD) = Pr(Y|DD) = 1
-most dominant alleles are rare
-probability of a homozygous carrier, DD, is extremely rare and can be ignored
-this means that a case must be Dd
-since the disease is rare it is unlikely for two carriers to mate so the only matings of interest are dD x dd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Autosomal Recessive Inheritance

A
  • assume that only the DD genotype is affected, full penetrance and no phenocopies
  • an infected individual must have inherited one defective copy from each parent
  • most recessive traits are rare so parents are unlikely to be homozygotes, likely to be Dd x Dd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

X-Linked Inheritance

A
  • males are XY, females are XX
  • in females only one of the two X chromosomes is expressed, the other is inactivated early in development
  • the mother always transmits an X chromosome, the father can transmit either an X or a Y thus determining the sex of the offspring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Segregation Analysis

Definition

A
  • the process of fitting genetic models (dominant, recessive, codominant) to data on phenotypes
  • aims to test hypotheses about whether one or major genes and/or polygenes can account for the observed pattern of aggregation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Segregation Analysis for Autosomal Dominant Diseases

A
  • since the disease is rare, most matings between an affected and an unaffected person will be Ddxdd
  • the offspring of this mating has a 1/2 chance of being affected
  • can test whether segregation ratio in observed data is 1/2
  • several tests: binomial test, standard normal test, Pearson chi-square test and likelihood-ratio chi-square test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Autosomal Dominant Disease Tests

Binomial Test

A

-suppose a random sample of matings between affected and unaffected individuals is obtained and that out of the total n offspring, r are affected
-null hypothesis - the segregation ratio is 1/2
-regard each offspring as a trial and each affected as a success, then Ho:p=1/2 and
P(X=x) = nCx p^x [1-p]^(n-x)
-for a two-tailed test, the p-value associated with r affected individuals out of n offspring is given by a sum of binomial probabilities
p = (1/2)^(n-1) Σ nCx
-where the sum is from x=0 to x=c
-where c=r if r≤n/2 and c=n-r if r>n/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Autosomal Dominant Disease Tests

Standard Normal Test

A

-a good approximation to the binomial test (for np>5 and n large enough)
-if X~Bi(n,p), the statistic
Z = [X - np] / √[np(1-p)]
-approximately follows a standard normal distribution
-null hypothesis: p=1/2
-in testing Ho, we observe r of n offspring affected, the test statistic becomes:
z = [r* - n/2] / √[n/4]
-since for binomial, μ=np and σ²=n(1/2)(1/2)
-and r* is r corrected for continuity
-for two-sided test, p-value is twice P(Z>|z|)
-reject Ho is:
1) p-value < 0.05
OR
2) |z| > c_{97.5%}, assuming α=0.05
-where c_{97.5%} is the 97.5 percentile of the standard normal distribution

25
Q

Autosomal Dominant Disease Tests

Pearson Chi-Square Test

A

-to test null hypothesis Ho: p=1/2, can make reference to the chi-square distribution
-the square of a standard normal variable is defined as χ1²:
Z² = [X-np]² / np(1-p)
-this can be composed to give a Pearson χ² statistic:
Z² = Σ (O-E)²/E
-for r of n offspring affected, can use statistic
z² = [r* - n/2]² / [n/4]
-p-value is computed as P_{χ1²} (Z²>z²)
-reject Ho if:
1) p-value < 0.05
OR
2) z² > x_{95%}, assuming α=0.05
-where x_{95%} is 95 percentile of χ1² distribution

26
Q

Likelihood

Definition

A

-likelihood is the probability density / distribution function but as a function of parameter rather than data
-for a binomial, pdf:
P(r;p) = nCr p^r [1-p]^(n-r)
-the likelihood function given we observe r out of n offspring affected is given by:
L(p) = nCr p^r [1-p]^(n-r)

27
Q

Autosomal Dominant Disease Tests

Likelihood-Ratio Test

A

-the likelihood ratio statistic is defined as:
Δ = 2 log(L1/Lo) = 2log(L1) - 2log(Lo)
-where log(L1) is the log likelihood given the observed data (evaluated at p^) and log(Lo) is that at the hypothetical value p=1/2
-under Ho, Δ is approximately χ1² distributed
-p-value is computed as P_{χ1²}(Δ>𝛿)
-reject Ho if:
1) p-value < 0.05
OR
2) 𝛿 > x_{95%}, assuming α=0.05
-where x_{95%} is 95 percentile of χ1² distribution

28
Q

Likelihood

Maximum Likelihood Estimator

A

-in observing r out of n offspring affected, the log-likelihood function is:
l(p) = rlog(p) + (n-r)log(1-p)
-take first derivative of l(p) wrt p and set to zero for mle:
p^ = r/n

29
Q

Maximum Likelihood Estimator

Standard Error

A

-for estimation of SE(p^), the observed value of r can be regarded as a realisation of random variable X:
X~Bin(n,p)
-mean and variance of X are np and np(1-p)
-so sampling mean and sampling variance are np/n=p and np(1-p)/n² = p(1-p)/n
-so
SE(p^) = √[p(1-p)/n]
-since the true value of p is unknown, plug in mle, p=r/n

30
Q

Segregation Analysis for Co-dominant Lovi

A
  • binomial and standard normal test are not directly applicable as we have three categories
  • Pearson and Likelihood-ratio Chi-square tests can be extended and applied
31
Q

Multinomial Distribution

A

-a generalisation of the binomial distribution
-in binomial experiment we have two outcomes success and failure with probability p and 1-p
-in multinomial experiment, have k outcomes with probabilities pi such that Σpi=1
-let random variables Xi indicate the number of times outcome i was observed over n trials
-pdf:
P(X;p) = n!/[x1!…xk!] * p1^(x1)p2^(xk)…pk^(xk)
-likelihood:
L(p;X) ∝ p1^(x1)p2^(x2)….pk^(xk)

32
Q

Segregation Analysis for Autosomal Recessive Disorders

A
  • only individuals with DD have the disease
  • not possible to select DdxDd families on the basis of disease status of parents
  • problem of ascertainment
33
Q

Problem of Ascertainment

A
  • usual procedure is to select initially a random sample of affected individuals in the population, probands
  • subsequently study their families for additional affected members, secondary cases
  • so DdxDd parents with no affected offspring will be missed
34
Q

Ascertaiment Probability

A
  • define π as the probability that an affected individual in the population is identified as a proband, the ascertainment probability
  • assume π to be constant for all affected individuals
  • probability a family with r affected offspring is not ascertained is (1-π)^r
  • probability the family is ascertained is 1 - (1-π)^r
35
Q

Complete Ascertainment and Single Ascertainment

A
  • when π=1, 1-(1-π)^r is 1 regardless of number of affected offspring so all families with affected offspring are ascertained -> complete ascertainment
  • when π->0, probability of ascertaining a family with r affected offspring becomes approximately 1-(1-rπ)
  • the probability of ascertainment is approximately proportional to the number of affected offspring
  • since π is very small, almost all ascertained families will have only one proband -> single ascertainment
36
Q

What are the statistical procedures designed to account for ascertainment?

A
  • there are statistical procedures designed to deal with ascertainment in two conditions:
    1) complete ascertainment
    2) incomplete ascertainment
  • -proband method
  • -singles method
37
Q

Dealing with Ascertainment

Complete Ascertainment Condition

A
  • all families with affected offspring are assumed to be identified
  • consider families of mating type DdxDd and s offspring
  • let X be a RV for the number of affected offspring in such a family, 0≤X≤s
  • then X would follow a binomial distribution with parameters s and p
  • for rare recessive disease, null hypothesis: p=1/4
  • all families ascertained have at least one affected offspring, X>0
  • the probability of observing r affected offspring in the family is conditional on the probability of X>0, given by a truncated binomial distribution
38
Q

Truncated Binomial Distribution

A

-the probability of observing r affected offspring in the family is conditional on the probability of X>0:
P(X=r) = P(X=r, X>0) / P(X>0)
-for 0≤r≤s:
P(X=r,X>0) = 0 for r=0 and P(X=r) for 1≤r≤s
-and P(X>0) = 1 - (1-p)^s
-hence, for 1≤r≤s the probability function of X is:
P(X=r) = [sCr p^r (1-p)^(s-r)] / [1 - (1-p)^s]

39
Q

Complete Ascertaiment

Estimating Segregation Ratio

A
  • two methods that arrive at the same estimate:
    1) log likelihood
    2) estimating equation
40
Q

Complete Ascertaiment

Estimating Segregation Ratio with Log Likelihood

A

-denoting ar as the number of families with r affected offspring we have:
L(p) = ∏ P(X=r)^ar
l(p) = Σ ar log[P(X=r)
-p^ is the value that maximises l(p), dl(p)/dp=0
-solved by numerical methods

41
Q

Complete Ascertaiment

Estimating Segregation Ratio with Estimating Equation

A

-note that
E(X) = [sp(1-p)] / [1-(1-p)^s]
-the only unknown is the segregation ratio p
-equate E(X) with mean number of affected offspring per family, r_
-solved by numerical methods

42
Q

Incomplete Ascertainment

A

-not all families with affected offspring will be ascertained, π->0
-want to estimate segregation ratio p and π
-devise a likelihood method because it does not make any assumption on complete ascertainment
-in complete ascertainment we basically use:
p^ = # affected offspring / #total offspring
-for incomplete ascertainment we have to adjust these numbers
p^ = R/S = adjusted #affected offspring / adjusted #total offspring
-probability of ascertainment:
π^ = B/R
B = adjusted number of probands
R = adjusted number of affected offspring
-proband method and singles method are two methods of adjustment

43
Q

Proband Method

A

-treat the siblings of probands as effective observations
p^ = R/S
R = total number of affected siblings
S = total number of sibilings
-so probands are excluded unless they themselves are siblings of probands
-for a larger data set:
–in a family with r out of s affected offspring we take r-1 and s-1
–if the family has b probands then each proband has r-1 and s-1
–suppose we have data from n families, segregation ratio is estimated by:
p^ = [Σbi(ri-1)] / [Σbi(si-1)]
-probability of ascertainment is estimated by:
π^ = [Σbi(bi-1)] / [Σbi(ri-1)]

44
Q

Singles Method

A

-takes as effective observations, ALL offspring except those who are the only proband in the family (singles)
-let the number of singles in a sample of n families be d and r is the number of affected out of s total offspring
-the segregation ratio is estimated as:
p^ = [Σri-d] / [Σsi-d]
-the ascertainment probability is estimated as:
π^ = [Σbi-d] / [Σri-d]

45
Q

Standard Errors for Proband and Singles Method

A
  • although the estimates of segregation ratio are simple in the proband and singles methods, their standard errors are complicated
  • but Var(p^) can be calculated for both cases
46
Q

Hypothesis Testing for Proband and Singles Method

A

Z = [p^ - po] / SE(p^)

  • Ho: p=1/4 (disease is recessive)
  • standard normal testing
47
Q

Genotype Frequencies

A

-for a locus with 3 genotypes; AA, Aa and aa frequencies are:
f(AA) = #AA/N, f(Aa) = #Aa/N, f(aa) = #aa/N
-sum of all genotype frequencies always =1

48
Q

Allele Frequencies

A

-the number of copies of a particular allele divide by total number of alleles in the sample
e.g. for locus with three genotypes; AA, Aa, aa
p = f(A) = [2nAA+nAa]/2N
q = f(a) = [2naa +nAa]/2N
-and f(A)+f(a)=1 always

49
Q

Hardy-Weinberg Law

A

-if a population is large, randomly mating and not affected by mutation, migration or natural selection then:
i) the allelic frequencies of a population do not change
AND
ii) the genotype frequencies stabilise (will not change) after one generation in the proportions:
f(AA) = p²
f(Aa) = 2pq
f(aa) = q²
-where p=f(A) and q=f(a)

50
Q

How can you show that equilibrium has been reached?

A
  • two approaches
    1) how the genotype frequencies stabilise
    2) how the allele frequencies stabilise
51
Q

How Genotype Frequencies Stabilise

A

-consider a biallelic locus with alleles A and a
-three genotypes AA, Aa and aa with frequencies P, 2Q and R such that P+2Q+R=1
-Hardy’s result: if individuals in the population mated at random, these frequencies would be
Q² = PR
-can show that this is true for the first generation of offspring: Q1²=P1R1
-the relative frequencies of the genotypes will remain unchanged after a second generation of random mating
-it is in this sense that this ratio of frequencies represents an equilibrium

52
Q

How the Allele Frequencies Stabilise

A

-denote f(A) = p and f(a) = q = 1-p
-under random mating, the frequencies of genotypes of the offspring are:
f(AA) = p², f(Aa) = 2pq, f(aa) = q²
-regardless of the genotype frequencies in the parental generation

53
Q

Implications of the Hardy-Weinberg Law

A

1) a population cannot evolve under HWE assumptions since evolution requires change in the allele frequencies of the population - reproduction alone will not bring about evolution other processes such as random mutation are required as well
2) when a population is in HWE, the genotype frequencies are determined by allele frequencies
3) a single generation of random mating produces the equilibrium frequencies p², 2pq, q²
- -the fact that we observe the genotypes in HW proportions does not prove that the population is free from natural selection, mutation and migration
- -it means only that these forces have not acted since the last random mating took place

54
Q

Testing for HW Proportion

A

-will consider two test; Chi-square test and likelihood ratio test
-hypotheses:
Ho : locus is in HWE
H1 : locus is NOT in HWE
-under Ho, both test statistics follow χ²df where df is the number of degrees of freedom, the number of expected genotypes - number of alleles

55
Q

Testing for HW Proportion

Chi-square

A
-test statistic:
Z² = Σ (O-E)²/E
-where E is the expected under Ho (HWE)
-pvalue:
p = P_{χ²df}(Z²>z²)
56
Q

Testing for HW Proportion

Likelihood Ratio

A
-test statistic:
Δ = 2log(L1/Lo) = 2[logL1 - logL2]
-where logL1 is the log likelihood giventhe observed data and logLo is that at the hypothetical value (expected under Ho)
-pvalue:
p = P_{χ²df}(Δ>𝛿)
57
Q

Estimation of Allele Frequencies for Non-Codominant Locus

A

-phenotype of individuals with heterozygous and homozygous normal alleles cannot be distinguished
-use MLE estimator:
p^ = (n1/[n1+n2])^(1/2)
-where n1 and n2 are the numbers of affected and unaffected individuals respectively

58
Q

What does it mean if we find a locus not in HWE?

A
  • one or more assumptions are violated
  • genotyping errors
  • mixture of subpopulations