Genome Sequencing Methods Flashcards
Physical mapping
BACs= bacterial artificial chromosomes
cloning vectors for larger pieces of DNA
physical mapping- provides path of minimally overlapping clones along chromosome
BAC
physical mapping
bacterial artificial chromosomes
Bac ends sequenced to provide landmarks in genome
BAC clones can be sequenced individually (clone by clone) – now outdates and done by whole genome shotgun method
Shotgun sequencing
whole genome sequencing
- fragment whole genome into 1-3kb pieces
- clone fragments into vectors (make a library)
- sequence clones from both ends (parallel end seq)
- align using computational methods
- individual reads aligned into Contigs. Contigs into scaffolds
- many gaps often remain
- can be problematic for large genomes with many TEs that are similar and repetative DNA
- fixed by genome annotation
Contigs
individual reads that can align together
- contigs linked together by forming scaffolds
scaffolds
connecting contigs together
Issues with shotgun sequencing
many gaps remain
large genomes can have many TEs that are similar to each other and repeatative DNA = misalignment etc
Genome annotation
After shotgun sequencing
gene finding programs (FGENESH and GeneScan)
- repeat masking done to hide repeated regions
- exon/intron structure predicted by programs
-transcriptome comparisons to find expressed genes and intron/exon junctions
-putative functions- found by comparisons to related species
-detection and annotation of Non-coding RNA and TES
Next generation sequencing
“2nd generation”
Ultra high throughput
-genome sequencing much faster
- 454, illumina, ion torret
- millions of sequence reads obtained per run instead of 96 or 384, with conventional sanger
- 454- sequencing by synthesis - no longer used
- illumina sequencing - higher throughput than 454
– 100-150bp short end (often paired end) reads
– used for re-sequencing to look for polymorphisms
–now (not often) used for de novo but challenges for assembling short reads – needs to be paired with other technologies
-ion torret= intermediate btw 454 and illumina, reads comparable to 454
Single molecule sequencing
3rd generation
- using DNA polymerase
- does not require amplification of template
- SMRT sequencing and Oxford Nanopore’s Minion
– long reads (2400-4000 bp)
moderate throughput- like 454 technique
-good for De novo sequencing - long reads= easier assembly
-sometimes can be paired with others like illumina to correct for errors
Resequencing of genomes
multiple lines, cultivars, accessions, ecotypes with sequenced reference genomes can be sequenced
- find variants etc
Genotyping
use of DNA data to analyze relationships in or among populations
- GBS (genotyping by sequencing) and RAD sequencing
- using illumina sequencing of restriction digested DNA, using barcoding to sequence many samples in lane
- used in population and evolutionary genetic studies
- previously AFLPS and microsatellites were used but not as efficient
- data allow for analysis of SNPs among individuals
Main purposes for RNA-seq
transcriptome sequencing
- reference transcriptome RNA-seq to obtain a set of reference transcripts
Expression profiling
- compare gene expression levels in 2 or more samples from RNA-seq data
Illumina
millions of 100-150bp reads
sometimes paired end
useful for expression profiling: high depth sequencing
de novo transcriptome ref sequencing but challenging with short reads
- normalize reads to length of gene– more reads higher mRNA expression
454 and Ion torret
Longer reads than illumina, easier to align together or to reference genome
very useful for reference transcriptome sequencing
not as widely used for expression profiling studies as illumina (read depth lower)
PacBio SMRT and Oxford nanopore’s minion
long reads sequence entire transcripts
very useful for reference transcriptome sequencing
allows sequencing of isoforms