4.1 Sequence Alignment Flashcards
What is the objective of a global alignment?
optimal alignment that includes all characters from each sequence (ex: cluster generates global alignment)
What is the objective of a local alignment?
optimal alignment that includes only the most similar local region(s)
ex” BLAST generates local alignments
What are the two main groups aligners can be classified into?
- sequence similarity searching with ranked solutions (one sequence is compared against many and rank solutions returned
- Sequence similarity searching returning only the optimal solution (comparing many sequences to one)
Why are statistics critical for sequence similarity searching?
Used to discriminate between real and artifactual matches which is done using a estimate of probability that the matched occurred by chance
Status allow us to give rank order and find optimal solution
Why are references important in bioinformatic analysis?
- A change in reference/database changes your search space and expect score –> a change in coordinate structure sequences are aligned to, it changes results
- Can’t move b/w references
- references impact alignment statistics
Why doesn’t BLAST scale to NGS requirements?
- BLAST uses a 3 word score that is extended in both direction and generates an expect score + value
- alignment of reads from single human genome re-sequencing experiment would take years
What are some examples where sequence alignment strategies are references specific
EX 1: Use BWA for genomic alignments
Ex2: Use STAR aligner for RNA alignments
Strategy used depends on the research question being asked
What is the objective of short read alignment?
To align 100s of millions of short reads against a known reference
Note: repeats longer than read length are problematic
What is Heuristics in computational bio?
There is a trade off b/w computational efficiency/resources and accuracy/precision
Heuristics goal to produce a good enough solution in reasonable time
Make choice between completeness & speed
What are some (4) examples of heuristics?
- Optimality
- Completeness
- Accuracy and Precision
- Computer and resources
How is indexing used?
to significantly increase alignment speed by converting the genome &/or reads into an index table of short “words”
T/F Indexing the reference genome is 0 based
T
What does position refer to in a index look up table?
The location in the genome that the sequence occurs
How are BLAST and NGS fundamentally different?
BWA extracts a seed from the 5’ end (which has higher quality). The 5’ end serves as the search space anchor
BLAST takes a 3 char word and searched for it all across the read
What are the 3 steps in indexing?
- deciding a seed
- Aligning the seed
- Extending the seed