sequencing & assembly Flashcards
1
Q
stages of genome sequencing
A
- fragmentation/cloning
- fragment library, amplification
- sequencing
- from both ends to get multiple reads
- processing
- base calling, quality assessment, repeat masking
- trim ends (decrease on polymerase affinity)
- assembly
- overlapping reads → contigs
- contigs → scaffolds
2
Q
pac bio
A
- 3rd gen
- longest reads ~20,000
- high error rate but random
- mutliple reads → consensus
- 99.999% accuracy
3
Q
phred scores
A
- quality score
- estimated confidence in each base call
- use to:
- filter and trim reads
- create consensus
- distinguish between variants and errors
4
Q
Q value
A
- given by phred
- QV = -10log10(Pe)
- Pe = probability that base call is an error
- ignore call if lower than 30
- <99.9% accuracy
5
Q
chastity filter
A
- illumina base call algorithm
- assign and filter intensity score for nucleotides at each position
- highest score divided by sum of highest and 2nd highest score for that position
- less than threshold (0.6) base marked N
- if higher assign base call
6
Q
factors affecting quality
A
- end of read deterioration (pol affinity)
- adaptor attached to reads
- high AT or GC content
- reduced complexity
- homopolymeric tracts
- unsure of length
- SNPs ignored (assumed as error)
7
Q
depth of coverage
A
- eliminate errors
- depends on genome complexity, read length, sequencer error rate
- HGP - 12x or greater
- each base present in 12 reads on average
8
Q
paired end sequencing
A
- sequence both fragment ends
- distance known → filter fragments by size
- knowing one position anchors the other
- better read alignment
- important for repeats
- improved prediction of structural variations
9
Q
repeats
A
- fragments with identical repeat regions can be assembled together
- in between sequences lost
- sequencing may be impossible
10
Q
repeats and paired end reads
A
- pair of overlapping reads, 1 unique, 1 repetitive
- map unique read
- position second as distance known
- enough paired reads allows sequencing across whole repeat region
- small repeats only
11
Q
mate pairs
A
- longer than paired ends
- kb vs 500bp
- bridge across repeats or structural rearrangements
- don’t sequence repeat but don’t lose information
- fill gaps with paired ends
- helps resolve correct order of repeat fragments
12
Q
scaffolding
A
- resolution of conflicting areas
- order non-overlapping contigs into scaffolds
- gaps with known or predicted size
- spanned by N (unknown sequence)
- bridge contigs with mate pairs
- de novo assembly - gaps remain
- need wet-lab work and paired end reads
13
Q
rearrangments and paried end reads
A
- compare to reference genome mapping
- decrease in size → deletion
- increase in size → insertion
- wrong way round → inversion
- maps to different region → translocation
14
Q
finishing
A
- fill in gaps, resequencing, different technology or longer reads
- design primer probe for PCR to reach end
- improve ocnsensus
- expensive
15
Q
limitations of assembly
A
- next gen small read lengths
- AT rich genomes
- repetitive genomes
- de novo sequencing
- no reference
- use multiple technologies