Final Flashcards
What is the phylogenetic inference problem?
Trying to create a tree that represents the evolutionary relationship based on DNA sequence.
What is CMTC?
Continuous time markov chain
It is a mathematical process that runs along the branches of the tree.
Compare first and second order CMTC.
First: probability depends on current state
Second: probability depends on current and previous state
How can nucleotide substitution be applied to CMTC?
Using a substitution rate matrix looking at the history of substitutions that became fixed in a population. SEE EQUATION IN NOTES.
What is are the rules for comparing nucleotide and amino acid sequences?
If protein sequence is greater than 100aa, it is homologous if 25% is identical
If nucleotide sequence is greater than 100aa, it is homologous is 70% are identical
What is BLAST?
Basic local alignment search tool. It is the most widely used heuristic algorithm in bioinformatics.
What are the three goals of BLAST?
Speed: the sizes of databases keep growing.
Sensitivity: must get all (or most) matches
Specificity: must get all (or most) correct
What is word assumption for BLAST?
Operates under the assumption that is 2 sequences are similar, they will have a word in common.
What are the three steps to the BLAST algorithm?
1) Find all possible words in the query sequence (removing those under the threshold)
2) scan the databases for the occurrences of these words
3) score the search using a matrix
Why does BLAST remove words scored under the threshold?
Words that score low are common occurrences and are more likely to be chance than actually homologous. Therefore you are saving time by excluding them.
What is the E value?
The number of alignments with score of s that would be expected by chance. E = expected value m = length of query sequence n = length of database s = raw score v and K = scaling factors E = kmne ^ -vs
What do the different E scores mean?
< 10^-100: identical
-50-1: maybe random
What are the different types of BLAST?
BLASTN: query and database are nt seq.
BLASTP: query and database are aa seq.
TBLASTN: query is aa and database in nt
BLASTX: query is nt and database is aa
Describe polymerase chain reaction.
Denatured at 94
Primers annealed at -68
Elongated with dNTP at 72
What are the objectives of sequencing?
Quick, accurate, easy, and cheap
What is sanger sequencing?
Developed by fredric sanger, it was the first sequencing method to be automized.
Describe sanger sequencing.
DNA is multiplied using PCR and is cut up. A radioactive primer is added along with DNA polymerase and dNTP. The solution is divided between 4 tubes, with a different ddNTP in each. Each solution is then run on a gel. Shorter fragments move further, each fragment a nucleotide longer in length allowing you to read the sequence.
How can you automate sanger sequencing?
Add a radioactive label that binds to the terminal ddNTP. Flashes of colour specify the order of nucleotides.
What characterizes next generation sequencing?
High degree of parallelization
High throughput
Low cost
Short read length
How do you prepare a library?
Target sequences are fragmented to the desired length either enzymatically or physically.
oNTP adapters are add to the ends of the target fragments and the final library is quantitated for sequencing.