HGA Studies Flashcards

1
Q

What are 4 categories of genetic variation?

A
  1. SNP - Single nucleotide polymorphism
  2. InDel - Insertion/Deletion
  3. SSR - Simple sequence repeat
  4. CNV/CNP - Copy number variation/polymorphisms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are SSR’s?

A

Simple sequence repeats

  • Microsatellites= small indels (micro – small; satellites = repeated sequence)
  • Mutate faster than SNP’s – generally selectively neutral (Don’t change phenotype)
  • Highly polymorphic because of their potential for faulty replication. Used in forensics, because is unique.
  • Small base pairs, dinucleotides MAYBE trinucleotides.
  • Means its unique
  • SSR mutations can cause disease: Huntington’s disease (HD) caused by triplet expansion SSR in coding region.
    o Normal Huntington <34 CAG repeats
    o Diseased Huntington >42 CAG repeats
    o Proteins folding which is related to function. So if it is much longer then the function will be affected
    o Dominant autosomal disease.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an easy assay to find SSRs?

A

o Isolate buccal cells
o Amplify by defining region using 2 PCR primers on either end
o Electrophoresis to see length to see how many repeats there are (How long

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a SNP?

A

Single Nucleotide Polymorphism

  • Locations within the genome where nucleotide substitutions are present
  • At least 1% of a population must have the alternative nucleotide variant to be considered a SNP
  • Percentage of SNPs in a pop depends on the population (how old it is, how much migration has happened etc.)
  • More SNP’s present in non-coding DNA’s & introns; often hit 3rd codon and don’t change function
  • Give an idea of population structure and migration patterns because of the accumulation of mutations etc.
  • Can be homozygous or heterozygous in an allele.
    o Homo T would have +strand T on both chromosomes (we diploid) etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are 2 molecular methods for SNP genotyping?

A
  1. RFLP where a SNP changes restriction enzyme site
    a. Only looks at 1 SNP at a time, basically a waste of time
  2. DNA microarrays where you can detect SNPs alleles at >1 000 000 loci
    a. Make a tagged probe that is complementary to the sequence of interest.
    b. One strand for each allelic version with different tags
    c. Use that in the genome of the person, and 1 of the alleles will light up telling you which allelic version of the SNP the person has
    d. Microarray is a microscope slide that is printed with specific DNA.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a CNV?

A

Copy Number Variant

  • More than 1 copy of a variant, usually very large. Larger chunk of sequence repeated.
  • Dose effect is very important (That’s why females turn off an X chromosome) – like microsatellite but piece of DNA repeated is MUCH (Megabases) bigger
  • Very long deletions or duplication in the genome are associated with 30% increased risk of psychiatric disorder
  • Deletion in some specific genomic regions are directly associated with autism, schizophrenia or mental retardation
  • > 99% of CNVs are inherited & not derived from new mutation
  • Chromosomal microarray using probes to find CNV’s
    o (and therefore intensity of color) is linked to how many copies there are
  • Is associated with phenotype, but not always bad:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Complex inheritance?

A
  • Most human traits do NOT have simple single gene inheritance patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Incomplete Penetrance?

A

Diseased genotype can occur in people who do not express the disease phenotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Genetic Heterogeneity?

A

Different disease genotypes are responsible for the same disease in different families
o i.e. 2 families can have the exact same disease & same disease phenotype, but can have different disease genotypes. i.e. different reasons for the disease
o Different mutations in the same (large) gene
o Or mutations in different genes (if multiple genes are responsible for the disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is polygenic determination?

A

Mutant alleles at more than 1 locus influence expression of the disease in a single person.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is complete penetrance?

A

The genotype ‘penetrates’ and is expressed in phenotype.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Complex Inheritance in Breast Cancer

A
  • Autosomal dominant inheritance of 2 unlinked disease loci predispose women to breast cancer
  • Runs in families but can also mutate de novo.
  • Polygenetic determination is at play (at least 2 genes associated)
  • Mutated BRCA1 & BRCA2 genes cause cancer of women reproductive organs
    o However, those genes aren’t mutated in all women with breast cancer:
    • Aren’t the only genes responsible
    • BRCA1&2 are genes with incomplete penetrance
    o Only 66% of women with a mutant BRCA1 allele will develop cancer
  • 1/10 chance of any woman developing breast cancer (Normal BRCA)
  • 4/10 chance with mutated BRCA2
  • 6/10 chance with mutated BRCA1
  • Child bearing and breastfeeding reduces risk of breast cancer because of the break it provides from hormonal cycling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the functions of the BRCA genes?

A
  • BRCA1 & BRCA2 are in same DNA repair pathway
    o Repair double strand DNA breaks
    o Promote homologous recombination
  • They do not share any protein structure
  • Both are tumor suppressors
    o Involved in DNA repair and check point regulation
    o Loss of function promotes cancer
  • Repair pathways contain other gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are common symptoms of Schizophrenia?

A
  • May have difficulty distinguishing between reality & imaginary world
  • May be unresponsive or withdrawn; have difficulty expressing normal emotions in social situations
  • Majority of people with schizophrenia are not violent, nor do they pose a danger to others
  • Symptoms are NOT identical for each person
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some causes of SZ?

A
  • Schizophrenia is NOT caused by: childhood experiences, poor parenting or lack of willpower
  • Has a large heritable component where genetic makeup plays an important role
    o Studied by looking at twins because they have identical DNA
    o Has heritability score of ±80%; so if you have it your kids most probably will
  • Neurotransmitters may play a role in phenotype
    o Worked this out by giving patients medication/neuroblockers and observed improvement
  • There is strong genetic evidence for schizophrenia (although we not sure what it is)
    o Known CNV’s are implicated as risk factors in 2-3% of cases
  • There is evidence for association between schizophrenia and CNV’s
    o Schizophrenia associated CNV’s also associated with autism, mental retardation, ADHD & epilepsy
    o Schizophrenia-associated CNVs span multiple genes
    o Exact mechanisms of pathogenesis is limited but some hints are emerging:
    • E.g. deletions of NRXN1 code for presynaptic neuronal cell adhesion molecule neurexin which implies abnormal synaptic function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What were the aims of the study by Kirov, et al?

A
  • Compare case (SZ) and controls in order to:
  1. Identify novel CNVs that increase risk of SZ
  2. Compare novel (de novo) CNVs vs inherited CNVs
  3. Compare SZ-linked CNV’s with CNV’s in other disorders
  4. Understand pathophysiology of SZ (mechanisms)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why are multiple cohorts important in a GWAS?

A
  • It is NB to compare data in the context of other data (populations etc.)
  • Using so many data sets strengthens reliability of results. P vals become smaller & more reliable with increased data sets
18
Q

What cohorts were used in the Kirov, et al paper?

A
  • Bulgaria: Bulgarian parent-proband trios
    o Used to determine if the CNV’s identified were de novo or inherited.
    o Proband = person or animal being studied
    o All cases had been hospitalized & met DSM-IV guidelines
  • Icelandic control: 2623 complete parent-offspring trios without the disease were the control
    o To ensure their findings weren’t isolated to their population in Bulgaria
    o Government sequenced most of their population, so they are the gold standard control
  • Autism case and control:
    o Data from recent large study based upon a high-density array used
    o To see if there were CNV’s that were shared between SZ population & those with autism – as well as with Icelandic control & non-autistic siblings that were used in both cases
  • Case-control data SZ sets:
    o Used 4 large publicly available data sets
    o Cases had SZ controls did not
    o Important to compare results in context of their data (unique to Bulgaria?) REPLICATION
19
Q

Describe a workflow from obtaining CNV’s associated with SZ to biological pathways associated with CNV’s

A
  1. Microarray analysis of DNA extracted from blood samples of cases vs controls
  2. Quality Control: Statistical analysis to check there were no statistical differences in the running of each experiment
  3. Did stats on >600 CNV’s to identify CNV’s associated with SZ
    o Also recorded level of education each case had - confirmed that all cases were SZ singe it isn’t associated with decreased mental ability
  4. Mapped variants to genes – then performed a Gene Ontology analysis which defines gene functions and how these functions are related
  5. Ran enriched pathway analysis:
    a. Looked at which genes were enriched in cases relative to the control (Ones that weren’t enriched in controls)
  6. Find pathways associated with SZ
    a. Finds the pathways of the genes from 5. which are enriched.
20
Q

What is a systems biology approach?

A
  • Clinical work
  • Genomic microarrays
  • Stats
  • Bioinformatic work in gene enrichment sets
21
Q

What were the findings of the Kirov et al paper?

A
  1. Rate of de novo CNVs in cases (5.1%) > than controls (2.2%) – suggests at least ½ all discovered CNVs are associated with SZ
  2. Found 34 de novo CNV’s
    o Didn’t show that older fathers correlated to SZ in offspring
  3. 8 (of 34) CNVs were at 4 known SZ loci and some were known to be pathogenic in other disorders
  4. Some CNVs were in genes which are components of the postsynaptic density
  5. 2 CNVs affected genes known to directly regulate DLG (family of membrane-associated guanylate kinases) family members.
    o Implies some epigenetic implications
22
Q

What is the difference between a GWAS and a linkage study?

A
o	Linkage:
•	Pedigree based
•	Everyone’s relatedness is known
•	Small number of individuals
•	Less complex diseases

o GWAS:
• NOT pedigree based.
• Array-based study
• Very large numbers of unrelated people are examined (case vs controls groups).

23
Q

What is an example of a simple GWAS pathway?

A
  1. Collect samples – cases & controls
  2. Microchip array
  3. Produces raw data
  4. Genotype calling to produce genotype data
  5. Identify chromosomal markers – map geographical loci etc.
  6. Interpretation
24
Q

What is a haplotype?

A

A haplotype is a unique set of common genetic markers (In this case SNPs) which are on a single chromatid and are inherited together. They have a high statistical association

25
Q

What is a tag SNP?

A
  • A single representative SNP in a genomic region (haplotype) with high linkage disequilibrium
    o i.e. only 1 SNP is needed for each haplotype (because they linked they will all show the same result)
  • Means you don’t need an array for every SNP, just 1 tag SNP per haplotype
26
Q

How do you assess a GWAS publication?

A
  1. Sample size: Typically, 1000’s of case & controls to detect small differences with statistical confidence
    a. Must be large and in conserved populations
  2. Quality control: Biggest challenge of successful GWAS is clean genotype data.
    a. Need to report QC metrics [genotype call rate, Hardy-Weinberg equilibrium)
  3. Confounders: Are there are other variables in the study which may be different between cases & controls other than the disease itself? (e.g. male age in SZ)
    a. E.g. in diabetes 2 study, control must have high BMI to match the high BMI of cases
  4. Replication: Can data be replicated independently – both using independent samples & independent technology?
    a. Replication in different populations and with different methods is important
  5. Biology: Does the data support a functional hypothesis?
27
Q

How can you deal with multiple testing in a GWAS?

A
  • Use a statistical correction: For testing and retesting (i.e. each tag SNP will be tested against a trait, but would be multiple SNPs in a gene, so that gene is being retested with biases the data)
  • False discovery rate: proportion of significant associations that are actually false positives
    o What is the chance that you got an association purely because of the way your experiment was set up and your stats are set up
  • False positive report probability: probability that the null hypothesis is true, given a statistically significant finding
-	Replication, replication, replication
	o	Of the study…
	o	Using different states…
	o	Of the study in different populations…..
	o	Using a different method
28
Q

What are 8 good questions to ask when reading a GWAS?

A
  1. Which samples were used?
  2. Why did they comb¬ine their genotyping with data from different databases or other studies?
  3. What was the main genotyping method used in the article?
  4. What kind of genetic variant was assayed?
  5. What were the confounding factors that the authors had to consider?
  6. Why does a GWAS have so many QC steps?
  7. Once a GWAS identifies loci associated with a disease, what is the next step?
  8. What was the main findings of this study?
29
Q

7 point summary of GWAS of 14 000 cases of 7 common diseases & 3000 shared controls

A
  1. 1st large GWAS in British population: 2 000 case & 3 000 shared control for 7 maj. Diseases
  2. Microarray tech to examine SNPs
  3. Identified 24 independent association signals
    a. 1 in bipolar disorder
    b. 1 in coronary artery disease
    c. 9 in Crohn’s disease
    d. 3 in Rheumatoid arthritis
    e. 7 in Type 1 diabetes
    f. 3 in type 2 diabetes
  4. Data signals reflect genuine susceptibility effects (correlate with prior findings from 7 replication studies)
  5. Some SNP associations were at previously identified loci
  6. Some SNP loci confer risk for more than 1 disease
  7. This study identified a large number of NEW loci that are likely to yield additional susceptibility loci
30
Q

What controls were used in the ‘GWAS of 14 000 cases of 7 common diseases & 3000 shared controls’ Paper?

A

o 1 500 from the 1958 British Birth Cohort

o 1 500 from blood donors recruited for this project

31
Q

How did authors justify combining data from 2 control cohorts in the ‘GWAS of 14 000 cases of 7 common diseases & 3000 shared controls’ Paper?

A
  1. Had all the samples EWAS data, took control group 1’s probes – compared against control 2 probes - formed skyline Manhattan plot with NO skyline (shows there is no difference between control groups)
    o Means that they were justified in combining the 2 control groups to form a single group
  • Can do a Q^3 plot:
    o Like a chi-squared plot
    o Look at expected association and real association for every SNP – if there’s a straight line, then no signal, if deviation then signal
  1. Looked for geographical variation & population structure:
    o Hidden population structure can result in false positive results
    o Different ancestries can carry higher disease risk; thus can be over-represented in cases
    o Did pairwise GWAS experiments – sample from north of Scotland and sample from SE of England, made skyline Manhattan
    a. Finds population genetic SNPs
    o Results: British population was heterogenous – i.e. there is population specific genetic SNPs
    o Removed those SNPs that carry a population signal from subsequent analysis because they could confound the results
32
Q

What were the results of the ‘GWAS of 14 000 cases of 7 common diseases & 3000 shared controls’ Paper?

A
  • Found strong associations with geography:
    o 13 genomic regions showed strong geographical variation
    o Predominant pattern is variation along a NW/SE axis, but overall effect of pop structure on association results was small
    a. Most likely cause for these marked geographical differences is natural selection
    b. Variation due to selection has previously been implicated at lactase and major histocompatibility complex
  • Disease association:
    o Assessed evidence for association in several ways and found some diseases had many chromosomal regions (and genes) associated with it
33
Q

What is DNA Methylation?

A
  • Always occurs on a C when C is next to G, or at a CpG site
    o 60-90% of CpG sites are methylated in mammals- called CpG islands
    o Occur at the 5’ regulatory regions of many genes
  • Methylation presence or absence can change gene expression without a mutation being present – because its in the regulatory region
  • Not ‘all or nothing’ can have degrees of methylation
34
Q

What is an EWAS?

A

EWAS (Epigenetic Wide Association Study)

  • Equivalent to GWAS but loci on array are epigenetic loci
    o i.e. CpG sites
  • Arrays generally examine DNA methylation
  • Methylation sites are generally in promoter regions & known to modify gene expression
  • Smaller samples sizes can be used than in GWAS
35
Q

What could account for the increase in ASD prevalence?

A
  • Increased ASD prevalence is related to increased ASD diagnoses because of:
    o Broader diagnostic criteria
    o Adoption of standardised assessment strategies
    o Increased awareness & decreased stigma
    o Linking of diagnosis to services
    o Inclusion of milder forms of neurodevelopmental disorders e.g. Asperger’s
36
Q

What is common in ASD phenotypes?

A
-	ASD phenotype is highly heterogenous & complex & overlaps with other behaviours & psychiatric phenotypes (pleiotropic)
	o	Social deficits [Core]
	o	Language impairment [Core]
	o	Repetitive behaviours [Core]
	o	PLUS many associated issues
  • DSM V:
    o Persistent deficits in social communication and social interaction across multiple contexts
    o Restricted, repetitive patterns of behavioural, interests or activities
  • Individuals with ASD considered to be compromised of different ‘endophenotypes’ due to inter-individual differences at a clinical level
37
Q

What are some implicated causes of ASD?

A
  • Many genetic studies have been conducted but aetiology (causes) of ASD is poorly understood
    o Makes it difficult to design therapeutic approaches
  • EXACT causes are unknown for most ASD cases but some implicated causes are:
    o Exome sequencing confirmed that 5% of SNPs & de novo indels contribute to ASD
    o Candidate gene studies (through a family) have identified rare variants which has led to numerous hypotheses
    o GWAS have found associations with ASD for numerous genes
  • Therefore, single mutations cannot explain the molecular mechanisms in ASD
    o ASD is a global condition (affects all tissues not only brain) of interacting networks of genes
    o Is a result of dysregulation of gene networks and biological pathways
38
Q

EWAS by Wong et al, 2013

A
  • Examined 50 pairs of identical twins who were discordant for ASD traits
  • EWAS array examined DNA methylation at ±28,000 loci of CpG islands in genome
  • Study used a specific ASD assessment to characterise the differences in their autistic traits in twins:
    o A 31-item questionnaire that measured autism traits such as speech delay & difficulty making conversation [discordant or concordant]
39
Q

What were the results of the Wong et al EWAS?

A
  1. Some twin pairs were discordant & had different DNA methylation patterns
  2. Of the discordant twin pairs:
    a. Each pair had an average of 37 genomic regions with significant methylation differences
    b. Most genomic regions were unique to that twin pair
  3. Methylation differences between discordant twins correlated with differences in twins discordant for different autism traits
    a. For social trains - brain receptor implicated
  4. Differential Methylation genes were both novel and some previously linked to ASD
40
Q

What are challenges faced by ASD research in Sub-Saharan Africa (SSA)?

A
  • There is no ASD databases in SA
    o There is numerous ASD data-bases in the USA
  • First step of the SA-ASD project was to build an ASD cohort
    o Recruited participants:
    + Case [boys - identified by ADOS] (6-12yr) & control
    o Assess the phenotype of ASD and control
    + 9.5% case excluded; 13% control excluded
    o Collect biological samples for study
41
Q

What is ADOS?

A
  • ADOS = Autism Diagnostic Observation Schedule
  • Gold-standard phenotyping tool that characterises individual ASD phenotypes
  • Semi-structured assessment of communication, social interaction & play for individuals suspected of having Autism
  • Examiner gathers standardised information:
    o Social behaviour
    o Vocalizations/speech, gesture, non-verbal language
    o Play/interests/creativity
  • ADOS assessment yields a wealth of phenotypic data