Genomic Analysis/GWAS Flashcards
___% of the genome is genic (exons and introns)
40%
___% of the genome encodes for proteins
1%
___% of an individuals genome consists of genetic diversity
1%
99% shared
______ is one of two or more alternative forms of a DNA sequence
allele
_______ is a rare change to the DNA sequence that severely disrupts function and typically leads to disease
Occurs in «1% of the population
mutation
_______ – a common change to the DNA sequence in which there are two or more different alleles. The minor allele occurs in >1% of the population
polymorphism
Polymorphisms occur roughly every ____ to _________bp (~10-15 million polymorphisms in the genome)
300 to 1000
________ is a single nucleotide variant (SNV) which occurs >1% in the population
SNP
_____ and ________ are the two major types of common genetic polymorphisms
SNPs
indels (< 50 bp)
___________ are repeats of short sequences of DNA and are a type of InDel
Short tandem repeats (STR)
________ _____________ Includes deletions, duplications, insertions, inversions, and translocations
structural variation
Copy number variations (CNVs) are a subset of structural variations that lead to a change in copy number (loss or gain) of a DNA fragment >______ bps (typically >___kb)
> 50 bps
typically > 1000 kb
_______________ affect entire chromosomes (e.g., Trisomy 21)
aneuploidies
SNPs occur every _____ to _________ bp
100 to 300
There are ______ SNPs in the human genome
10 million
____________ = a unique and stable identifier for every SNP. It links the alleles/genotypes to a chromosomal location (which is updated with each genome build)
rsID
_____% of the genome is noncoding
80%
___________ disease = Mendelian disease
Monogenic
___________ - the range of disease presentation (e.g., mild vs. severe)
Expressivity
Heritability varies by ________ and _____________
population
environment
________: a physical location in the genome. Typically used to refer to a region of the genome associated with your trait of interest.
Locus/loci
___________ Use family pedigrees of multiple affected individuals as genetic variants close together are linked
linkage analysis
________ studies test for the statistical relationship between genotype and phenotype
association
______________ analysis is best for variants with modest to high effect size
linkage
Two genetic loci are ______ if they are transmitted together from parent to offspring more often than expected under independent assortment (50%)
linked
Two genetic loci are in linkage _________ if this holds true (transmitted more than 50% together) at the population-level
disequilibrium
______________ analysis – identifies the chromosomal location of a disease gene by looking for genetic markers that co-segregate with the disease phenotype
Simply put: Do affected individuals have a common haplotype seen more often than unaffected individuals?
We use genome-wide genetic markers (e.g., SNPs, microsatellites [STRs]) to tag loci
linkage
Linkage studies report likelihood of linkage in terms of ______ _______ __________ scores
logarithm of the odds (LOD)
Linkage analysis does not work well for variants with ______effect sizes typically seen in polygenic diseases
small effect sizes
Complex traits are known as ____________ traits
polygenic
With _____________ traits environment plays a role in etiology
polygenic
______________ studies test for the statistical association between genotype and phenotype. Typically, under an additive genetic model, where each additional minor allele changes the phenotype by the same amount
association
___________ gene studies test a limited number of variants/genes based on a priori information (e.g., biological hypotheses or focus on a linkage peak)
Very high false positive rate
candidate
_____________ test millions of common genetic variants across the genome for a statistical relationship with your phenotype of interest
GWAS
___________ regression is used to analyze GWAS
linear ( or logistic)
GWAS ___________ plot is a tool to visualize our loci significantly associated with BMI
Manhattan
X - chromosomal location of gene
Y = statistical strength of association
Approximately only ~_____million independent common genetic variants in the genome (in individuals of European ancestry)
1 million
GWAS identify __________ of correlated SNPs in loci associated with your trait of interest. These loci may overlap genes, suggesting their importance. However, GWAS do not directly identify the causal genes in the locus.
clusters
GWAS tend to name loci by their nearest ________, but this does not mean we are confident it is the causal gene
gene
Binary traits/case-control outcomes (e.g., Obese vs. non-obese) are fit using a ________ regression and the beta is transformed into an ______ _______
logistic regression
Odds ratio (OR)
Do GWAS studies translate well to other more diverse populations
No, not necessarily
_________ ___________is a measure of an individual’s overall genetic risk or propensity for a given disease or trait
Polygenic Scores (PGS)
How do you calculate a simple PRS?
The simplest model:
For each individual, and for each SNP, multiply the number of effect alleles (0, 1, or 2) by the effect (beta) on the trait of interest
PRS = Sum of these values (the aggregate risk for each individual)
The higher your PRS the higher your ____________ genetic risk for the disease of interest
aggregated
PRS = effect __________ x the effect (or _______) + A2E2 + A3E3
effect allele (0, 1, 2)
effect (beta)
Individual 1 PRS = 02.2 + 21.4 + 0*1.6 = 2.8
More complex PRS models take into account _________-______ genetic effects and correlation between SNPs
genome-wide