Week 1 - Association Analysis Flashcards
What are the features of complex diseases?
- High incidences so are very common
- They all follow a Non-Mendelian Transmission
- See a clustering within a family group
- Can and often are more than one gene involved.
- Variable Expressivity
- Polygenic – caused by many genes. Environment and Lifestyle are also additional factors
Name the types of Polygenic Inheritance?
- Additive – sim of effects of two or more gene loci
- Multiplicative – combined effects of two or more gene loci.
- Epistasis – gene-gene interactions, suppressive or stimulatory
- Variable penetrance depending on location on the genome
Give examples of diseases caused by polygenic inheritance
Asthma, Diabetes, Hypertensiom, Obesity and Cancer
Why do we study complex diseases?
To identify genetic risk factors, susceptibility determination, identification of environmental factors and diagnostic tests and gene medicine.
What are the effects of genetic drift on allele frequencies?
- Effect of population on genetic drift.
- Allele frequency in a small v large population. In a large population they are usually stabilised. Genetic drift is more likely to occur in a small population.
What is association analysis?
- Selection can be detected by analysing sequence variation
- Synonymous substitutions (no change in aa) accumulate faster than non-synonymous substitutions (change in aa) (approx. x10 fold)
- Ratio of synonymous to non-synonymous substitutions (Ka/Ks) is a measure of selection
- Comparisons of rates of substitution can indicate whether selection on a gene has occurred
- Rates between number 1-3 and show any selection pressure.
Describe the study that confirmed the presence of genetic factors in diseases at population level
•Prof. Arid and colleagues (1953) observed that O blood group was common in North England and so was Stomach cancer, but True Association is with A blood group
Case-control studies, Odds Ratios and c2 test
•Found true association was with the A blood group.
Describe the Candidate Gene Approach
a) Identify specific genes whose known function may influence phenotype
b) Screen such genes for mutations which may affect function
Describe the advantages and disadvantages of the Candidate Gene Approach
Advantage: time saver; simple statistical analyses.
Disadvantage: more likely than not that undiscovered genes play important role in disease
Describe the Genome Wide Scan
a. Scan entire genome using highly polymorphic DNA markers (i.e. RFLPs, VNTRs, STRs, SNPs)
b. Identify regions of genome which co-segregate with disease phenotype
c. Directly screen known genes in linked regions for mutations
d. Linkage Disequilibrium studies.
What is Linkage Disequilibrium
The non-random association of alleles at different loci
Linkage Dissociation
- Human genome – look for an association between an alleles frequency and its LD with other genetic markers surrounding it
- LD makes tightly linked variants strongly correlated producing cost savings for association studies – possibility of identifying association with the disease by direct susceptibility marker (direct association) or a marker that is in high LD with susceptibility marker (indirect association)
What is Linkage Disequilibrium affected by:
Natural Selection, genetic drift, recombination and mutation
What are the features of DNA when it is supercoiled into Chromatin
- In nucleus DNA is associated with proteins (histones)
- DNA helix is tightly coiled around histones to form chromatin
- A nucleosome consists of one histone with 2 turns of DNA around it
- Euchromatin is a lightly packed region of chromatin, often under active transcription
- Chromatin further coils to form solenoids which form loops and minibands (heterochromatin)
- DNA is only visible when chromosomes condense during cell division
what is the structure of a chromosome
- Human somatic cells contain 46 chromosomes
- Gametic cells contain 23 chromosomes
- 44 autosomes and 2 sex chromosomes
- Telomeres and centromeres are essential components
- Centromere divides chromosome into 2 arms (p and q) and is where spindle fibres attach during cell division
- Centromere location gives chromosome characteristic shape, useful for identification
What is allelic disequilibrium
Particular alleles at two or more neighboring loci show allelic association if they occur together with frequencies significantly different from those predicted from the individual allele frequencies.
Case control studies…
are a ratio not a proportion. the mode of inheritance affects the next stage of analysis. it depends on if it is dominant/ recessive or multifactorial
Woolfs - the original basic method for association analysis. what are the features?
- Most Common and Simple,
- Can be used with any marker! (genotype, allele, others)
- Robust, Allows combining of data,
- Requires Patients and controls (case control)
Steps in association Analysis - Significance. STEP 1
Work out odds ratio = a x d / b x c. anything higher than 1 is very high
Steps in association Analysis - Significance. STEP 2
Work out significance of the association. done by:
expected number = row total x column total / grand total
Steps in association Analysis - Significance. STEP 3
now you have observed and expected: (observed - expected)^2 / expected.
Steps in association Analysis - Significance. STEP 4
o Remember, you have to this for each cell (genotype/allele) and then add these values together to get the chi-square value
o Assess the observed value against the c2 critical value in the table
o Calculate degrees of freedom df = (number of rows -1) * (number of columns – 1)
o For 1 degree of freedom at p value of 0.05, this value is 3.84.
o If observed value is higher than critical value it will indicate significant association
Steps in Association Analysis - Confidence Interval
The confidence interval for the Odds Ratio is calculated on the natural log (Ln) scale and then converted back to the original scale.
The sampling distribution of the Odds Ratio is positively skewed.
However, it is approximately normally distributed on the natural log scale.
The confidence interval is calculated on the natural log scale (Ln)
After finding the limits on the natural log scale, use the inverse LN function to find the limits on the original scale.