Genome and SNP analysis Flashcards
What is a locus?
Region of DNA
What is a gene?
Section of DNA that encodes a protein (molecule)
What is a polymorphism/genetic variant?
A position in the genome that varies in the population (single nucleotide polymorphism => SNP)
What is an allele?
Alternative versions of a polymorphism
What is a genotype?
Combination of alleles a person has in each particular polymorphism
What is a haplotype?
Combinations of alleles in a chromosome
What is a trait (disease or quantitative)?
A phenotype
How genetically similar are humans in %?
Human beings are 99.9 percent identical in their genetic makeup. Differences in the remaining 0.1 determine our differences, including our susceptibility to certain diseases
What do we look for in genomic studies?
We look for changes in the genome that are associated with changes in the phenotype
What are candidate gene studies good and bad for?
They are more accurate but give less information in general
They are hypothesis-driven
What are genome-wide studies good and bad for?
They are less accurate but give better understanding in general
They are hypothesis-free and hypothesis generating
What are DNA probes?
Small DNA sequence (40bp) that attaches to our DNA if it is complementary
What are Chips?
Millions of DNA probes that is used to try out which probe will be complementary to our DNA
What is genotyping?
Genotyping is going to specific SNPs and looking at their DNA sequence (need of a probe/chip to select the precise sequence)
What is sequencing?
Sequencing is to read every single base pair of a genome or part of a genome (much more expensive)
What is linkage disequilibrium and why is it important to take it into consideration?
In population genetics, linkage disequilibrium is the non-random association of alleles at different loci in a given population
This happens when random mutations in a given population happen and that it is passed down only to that precise population => this gives different haplotypes
This is very important to take into consideration when working with different populations because it can be miss-leading when we think that a specific SNP is the cause of a disease but it is actually just a haplotype of the population tested
What is a imputation?
It is a statistical inference (deduction) of unobserved genotypes
Aim: to test initially-untyped genetic variants for association with a trait of interest
Reference panels:
- reference panel choice is population specific
- the bigger the better (computing power)
What do we need when seeking to identify a genetic variant?
- at least more than a thousand people, ideally 20’000 people
- a lot of Chips to genotype everyone with the database to store it
- a specific phenotype to look for (obviously)
What is OR (odds ratio)?
How much more risk of disease do I have if I carry this risk allele
OR=2 means that having a particular risk allele DOUBLES my chance of having the disease
What is P-value?
Probability of this statistical result to be found by chance (probability of false positive)
What does Beta show?
The strength of an association
What is Bonferroni correction?
It is a correction to do when you have a really big population
Genome wide significant = 5*10^-8
Does having found an associated variant mean that you found the causal variant?
No! Becasue it could be a linkage disequilibrium
Is the closest gene necessarily the causal gene?
No! But usually yes…
What does genome wide significance mean?
It means that the significant level that we require is generally a p-value lower than *10^-8, this is done when the number of people tested is very high => or else their will be a lot of false positive