L4, Genomics and Health I Flashcards
1
Q
Human genome project: Overview
A
- 1990 to 2003
- Generated a representative human genome sequence of around 3 billion bases
- Performed using Sanger sequencing
- 92% of genome covered
- Used 1 reference genome (20 volunteers)
2
Q
NGS: Features
A
- Long Read released 2022
- aka: Massively parallel sequence
- Simultaneous -> reducing cost
- Helps resolving duplication and repetitive regions of Human genome project
3
Q
Why are reference genomes useful?
A
- Provide a common reference point for genomic loci -> gives gene ‘addresses’, reported variants are relative to the reference genome
- Provides a template (Guiding assembly of new genomes, enables assay design and data analysis)
4
Q
SNPs: What are they? Prevalence? Why are they so well investigated?
A
- Single nucleotide polymorphism -> more than 1% of population has this variation
- ~4-5 million SNPs per individual, over 600 million reported
- Ease of analysis
5
Q
Common structural variants in DNA (x6):
A
- Deletion
- Insertion
- Duplication
- Inversion
- Translocation
- Copy number variation (e.g. microsatellites)
6
Q
How are SNVs different to SNPs?
A
- Don’t have the 1% of population requirement
- V = variations
7
Q
Haplotypes : What are they?
Stats included
A
- Haplotype = a set of closely linked genetic markers or DNA variations on chromosome that tend to be inherited together
- SNPs within a block (~5kb) can stay associated for many generations e.g. disease susceptibility allele and marker SNP in same block
- 4-6 alternative haplotypes for each block, with around 20 SNPs per block
- Consider humans as haplotype mosaic
8
Q
Factors to consider when choosing a sequencing technology:
A
- Cost (experimental, analysis, logistics)
- Time (sample preparation, run time, analysis time, sample transport)
- Information capture (accuracy, feature length, complex variant detection)
9
Q
Key methods for high-throughput exploration of genetic information:
A
- Sanger sequencing, Short-read NGS or Long-read NGS will be used
Inputs:
- Whole inputs (either genome or transcriptome)
- Amplicon (PCR used to amplify particular gene -> cheaper, more specific)
- Enrichment/depletion (slightly wider net than amplicon e.g. sequencing exons only, but still narrowing things down)
- Other: Microarray (high through-put)
10
Q
Define read, assemble and map:
A
- Read: The sequence corresponding to a DNA fragment
- Assemble: Aligning and merging reads to reconstruct the original DNA sequence
- Map: Determining where the reads originated from in a genome
11
Q
Define depth and coverage:
A
- Depth: Number of times sequencing reads cover a specific region of the genome
- Coverage: Context dependent; similar to depth when discussing how much sequencing is done (e.g. 4 fold, 4X, shallow or deep), whereas when discussing alignment it usually means % covered by reads
12
Q
How are samples typically prepared for sequencing:
A
- Long strands of DNA are fragmented (physical, chemical or enzymatic methods) -> small pieces
- Sequence the individual reads
- Either assemble the reads together through overlaps or map the reads to reference genome
13
Q
Sanger sequencing: Read length and benefits
A
- 800-1000 bps
- Fast and cost effective
14
Q
Sanger sequencing: Challenges and use in medical genetics
A
- Low sensitivity, requires higher sample input
- Often used in diagnosis of diseases through sequencing of known genomic reasons (e.g. BRCA)
15
Q
Short read NGS: Read length and benefits
A
- 20-500bp
- High sensitivity, ability to discover new variants, low sample input