Genome Analysis Flashcards
What are the two basic approaches of next generation sequencing?
Random sequencing- Random bits and stitch them together with a computer- better for bacterial genomes.
Targeted sequencing of an overlapping set of large fragments- obligatory for plant genomes as they are so big.
Why does random sequencing not work for large genome?
These sequence reads are all internal to a sequence that’s multi-copy and dispersed across the genome – you cannot use them to join the two sequence contigs.
Can’t line up all the repeats up as they are all identical.
How do you analyse larger plant genomes?
Sequencing only the expressed genes. 2 basic ways:
- NGS sequencing of cDNA = “RNA-seq”. No need to clone the cDNA, just do NGS.
- Exome capture NGS.
How does exome capture NGS work?
Can harvest bits of interest by synthesising oligonucleotides that are homologous to a bit of them and putting them onto an assay.
Make probes attached to a bead- these will hybridse with DNA- only sequence ones that are stuck to probes.
Need some sequence info in the first place, to design probes.
What are genetic maps?
What are physical maps?
Genetic maps are measured in genetic map distances- mapped by genes or genetic markers (that give characteristics).
Physical- set of clones or inserts with markers on them (can be same as genetic markers).
Can link maps together.
How can you sequence large plant genomes?
Targeted sequencing of an overlapping set of large fragments.
What is required for sequencing large plant genomes?
A high density genetic map with loads of mapped markers.
BAC library(s) – i.e. the entire genome in cloned form.
A “minimum tiling path” set of BAC clones (can be stored in vectors).
Next generation sequencing – LOTS of it!
Bioinformatics- powerful computation resources.
Need backup.
What are BAC libraries?
Bacterial Artificial Chromosome.
Effectively a plasmid, with hardware to survive as pseudochromosomes in bacteria. Big and stable. Yeast analogs that were used before kept recombining. Don’t get many per cell. These are our bite sized pieces.
Can use these to clone into E.coli.
Insert size typically 100-150kb.
Can be grown like a bacterial plasmid and screened for gene content by hybridization or PCR.
What are the applications of BAC libraries?
Whole genome sequencing.
Fluorescent in situ hybridisation (Can label it with fluorescent labels, and stick DNA to chromos to see where BAC comes from.)
Construction of integrated genetic and physical maps.
Positional cloning.
Fine mapping of interesting genes.
Chromosome walking and contig assembly.
What is fluorescent in situ hybridisation (FISH)?
Visual mapping of DNA clones on the chromosome.
Chromosome Assignment.
Make a DNA probe fluorescent, stick it to chromosome spread and see where DNA sticks.
Can be used to work out relationship between genetic and physical distance of genes.
How can genetic and cytogenetic maps be compared?
Scale can change a lot on a physical map.
Middle bit of physical map disappears from genetic- no recombination in this region.
Genetic maps only work with recombination- won’t get a distance with no recombination.
How does BAC fingerprinting work?
If clone ends overlap, they will have restriction fragments in common.
Could sequence ends of all of them.
If dealing with a big genome- lots of repeats- sequence an end and it could stick randomly to many places, not just one.
Need to get a fingerprint of bit that overlaps and see what it correlates to.
Minimum tiling path- want minimum overlaps (small ones).
What is Illumina?
Illumina NGS is dominant NGS method.
Short reads (76-150bp).
300M reads per lane.
200Gb+ per run.
Each lane costs about £1500.
Run would be a weekend.
Gb- equivalent to about 7 human genomes.
Start with DNA, chop up, fill in raggedy ends w/ enzyme. Add adaptors- recognition point for sequencing, and hook to stick to machine. Has same adaptor at both ends.Amplify up to a few thousand, then use fluoresence based sequencing to sequence clone.
How does Illumina work on a molecular level?
Polymerase copies first bit. Melt away blue molecule.
Copy is bolted to machine.
Purple is sticky to blue end- capable of looping round and joining with little blue.
Can copy round from little blue.
Unstick strands- produce clusters of identical sequences all stuck to machine. Amplified up so they can produce signal.
Stitch primer to purple of clones, point down, and then add bases. Light flashes different colours which indicates which base has been added.
Same for all copies since they are clones.
How can small gaps in the genome be bridged?
Small gaps can be bridged by Paired ended NGS (like illumina, but sequences from both ends of the molecule).