Lecture 3 Flashcards

1
Q

What is single nucleotide polymorphism?

A

when part of the population don’t have the same nucleotide and therefore aid in identifying genetic risk factors for common disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In Genome wide association studies, what is the most common used strategy setup? explain

A

case control, where two large groups of individuals with one healthy and one group with certain diseae are studied. all individuals are genotyped for the majority of known SNP locations using methods such as Illumina and Affymetrix. For each we count the number of healthy individuals without the specific mutation, and with the mutation as well as diseased or not Diseased with and without the mutation. we then calculate the OR of having disease with minor to major variant. Ds/HS / DN/HN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

illumina, affymetrix?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If the number of major and minors in a sample are close, does this represent the population?

A

No, the dataset is inbalanced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the odds ratio tell us?

A

it only informs us about a potential association between SNP and the disease, however it doesnt tell us if this is by chance or if its statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we test the association between SNP and Genetic for statistical significance?

A

contingency table tests such as fisher’s exact, Pearson chi squared to find the p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the null hypothesis when performing a fishers’ exact test ?

A

that the number of individuals addressing both classes is based on chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the more correct formulation of the null hypothesis?

A

the random variable of the number of individuals expressing both A1 and B1 is distributed according to a hyper geometric distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the test statistic for Pearson’s chi squared test?

A

sum2_i,i=1, (Oij-Eij)^2/Ei,j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is typically the results of a genome wide association? how is it interpreted?

A

a graph with SNP positions sorted due to the chromosome location and the -log p value on the x axis. SNPS with extremely low p values could be acossiated with the ideas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why did GWAS receive criticism?

A

1- missing quality control steps (like for possible biases) 2- multiple testing 3- correlations only between single SNPS but not between genes tested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do phylogenies show us?

A

It depicts the genetic relatedness of sequences, and how individuals are related in an evolutionary scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does hamming distance measure the sequence difference?

A

distance = count of sites that vary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does p distance measure the sequence difference?

A

number of difference sites/sequence length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the fundamental problem with phylogenetic?

A

we look at sequences but not their evolutionary history. so we need to take all possible trajectories into account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the definition of sequence distance?

A

The distance between two sequences is the expected number of nucleotide substitutions per site ( includes all non visible evolutionary steps between two sequences aswell)

17
Q

What are two requirements of a model for estimating the distance between sequences?

A

1- stochastic process modelling the substitution through time
2- substitution rates

18
Q

Does Markov chain have memory or not?

A

No, its memoryless as the probability to jump to another state depends only on the current state.

19
Q

if tau is discrete, Markov chain is –, otherwise —.

A

discrete, continuous

20
Q

When is Markov chain time homogenous?

A

if the transition probabilities on the state space dont change over time, its called time homogenous.

21
Q

Write down the substitution rate matrix

22
Q

What doe rate and probability each measure?

A

rate measures events per time unit and probability measures the chance that a random event occurs.

23
Q

rate describes –, and probability describes –.

A

average, an exact event

24
Q

What does it mean for an event to occur with rate alpha? write down the CDF and PDF.

A

slide 71 + it means that it occurs after an exponentially distributed waiting with with parameter alpha.

25
Q

P(t) = ?

A

e^Qt, the substitution matrix defines the transition probabilities. where every possible series of states are visited in time t.

26
Q

Why are Markov chains a great model for nucleotide substitutions?

A

1- memorylessness : since nucleotide substitution happens independently from the substitution rate history at that site. 2- substitution rate matrix defined the transition probablities 3- the transition probabilities take into account every possible substitution path

27
Q

In JC69, how are the substitution rates distributed?

A

they’re all the same

28
Q

In K80, how are the substitution rates distributed?

A

transitions happen at rate alpha (AG, TC), transversion happen at rate Beta (AT, CG).

29
Q

In TN93, how are the substitution rates distributed?

A

transitions between T and C happen at rate alpha_1 * nucleotide equilibrium frequency
transitions between A and G happen at rate alpha_2 * nucleotide equilibrium frequency
trans versions happen at rate B* nucleotide equilibrium frequency

30
Q

what is the model at which alpha_1 = alpha_2

31
Q

What is the GTR model, is it time reversible? show how. name some pros one cons of it

A

GTR short for generalised time reversible model. Slide 87

It is quite flexible, it’s time reversible however it isn’t completely general.

32
Q

What is the most general substitution model? name sone pros and cons.

A

UNREST, each substitution has a different rate. pros : all other models are special cases of UNREST and it’s the most general case. cons: mathematically very complicated and not handy to use, not time reversible

33
Q

How many parameters does each model have? JC69, K80, HKY, TN93,GTR,UNREST

A

JC69: 1, K80: 2, HKY : 2+3* (equilibrium of frequencies), TN93: 3+3, GTR: 6+3, UNREST:12

34
Q

If a region of the human genome evolves according to JC69 at the rate 2.2/3*10^-9 substitutions per site/year. The probability that starting with a T after t=10^6 we observe a C is what? and that is still is a T is?

A

last lecture slide of lecture 3

35
Q

In a GWAS, why can you not reject your null hypothesis if the - value is less than alpha?

A

due to multiple testing, we can reject it if its less than alpha/n

36
Q

Why is the Markov chain model a good model for sequence evolution?

A

because it gives all possible ways of evolution with all steps from one nucleotide to another one (1/n!) + all other 3

37
Q

Why isn’t it advisable to reconstruct a phylogeny based on the hamming distance ?

A

because it doesn’t account for the sequence length and all the possible ways of evolution in between.