Molecular Genetics (Gary Barker 16-18) Flashcards
How long did it take to first sequence the human genome?
13 Years (1990-2003) by capillary Sanger sequencing.
How many human genomes can an Illumina sequencer read in one day?
20!
- 1000, billion bases per run
Traditionally, how does shotgun sequencing work?
- used to decode a genome by fragmenting it into smaller fragments, which can be ligates into BACs to be individually sequenced.
- A random BAC is selected and sequences are then ordered based on overlaps in the genetic code.
Explain how next generation (Illumina) sequencing works.
- the genome is fragmented and tagged with multicoloured fluorescent probes.
- no prior knowledge of the sequence is required.
Why is it better to have a higher coverage?
Coverage is the number of reads that include a given nucleotide sequence.
-a higher coverage will have fewer gaps in the genome (10-fold sequencing reduces the chance of 0 coverage)
Overlap finding for large genomes (e.g. human) is not practical, what is an alternative?
K-mer based assembly:
- sequences are fragmented further
- every K-mer will overlap with the one next to it, with a single base difference.
- a computer van better handle this as it needs to find fewer overlaps
- Use the smallest K-mer that produces the best n50 value.
Repeats in eukaryotic genomes can cause problems in shotgun sequencing as they can get ‘lost’. What are two solutions to this problem?
1) use Illumina mate-pair libraries, where only the ends of the repeat need to be sequenced.
2) use an Oxford Nanopore, which can cover gaps as they can read long sequences.
- holds the record of sequence 1 megabase at one time
Why are prokaryotic genomes easier to sequence than eukaryotic ones?
- Prokaryotes have few repeats; eukaryotes have lots of repeats.
- They can be assembled directly from the Illumina paired-end reads; eukaryotes need mate-pair reads or nanopores longer than repeats.
- cheaper as eukaryotes require many Illumina reads.
What are the advantages and disadvantages of Illumina Hisseq and the Oxford Nanopore?
- Illumina is highly accurate (>99.9%) but nanopores have a low accuracy (90%).
- Nanopores can sequence long reads and is quicker, whereas Illumina can only sequence short reads.
- Nanopore can span gaps/repeats and is portable.
- Illumina is better for whole genome sequencing
Give 4 reasons as to why we bother sequencing genomes.
1) to characterise all genes and regulatory elements.
2) to identify pathways and co-located genes.
3) to compare genomes (e.g. mutant Vs wild type)
4) to identify candidate markers (SNPs associated with phenotypes)
What are the advantages of an Exome Capture Array?
- time-saving and cost-effective compared to PCR based methods
- can concentrate on just exons in genomes that may not be sequenced yet or too expensive (e.g. just look at grain size in wheat for bread).
How does exome-capture work?
- probes called capture baits are designed based on a cDNA copy of mRNA.
- Oligonucleotide/ bait sequences are tiled out on an array.
- can extract target genomic DNA and sonicate it, producing fragments with coding regions, non-coding regions and regions with both.
- much of the non-coding DNA is not captured by hybridisation to the baits = left with coding regions to study.
Locating ORFs in can be difficult, especially where introns break up the coding region. How can we locate genuine ORFs?
- usually the longest one
- look at codon usage; there are many ways to code for an amino acid but some codons are prefered to others. Real ORFs will show a codon bias ( e.g.AGC for serine) but non-coding ORFs will have equal use of codons.
- Introns usually start with AGGGTAAGT and end with 6 pyrimidines followed by and base and then CAG (YYYYYYNCAG)
- observe expression levels via RNAseq or a microarray
Compare RNAseq and microarrays for analysing gene expression.
- RNAseq covers all expressed genes, but microarrays involve making fluorescent probes for already known genes.
- RNAseq works for non-model species but microarrays are only useful for previously characterised species.
- RNAseq has no setup cost but costs £1000 per sample; Microarrays are expensive to make but can run multiple samples at £250 each.
- RNAseq may be dominated by a few highly expressed genes; in Microarrays, any single gene can fluoresce while not affecting others.
If a BLAST search comes back as inconclusive, what four questions can you investigate to determine a gene’s function?
- What happens to expression under various stresses?
- What happens when you knock the gene out?
- What happens when you overexpress the gene?
- What genes have similar responses to stresses?