Struggle Flashcards
What is bioinformatics?
It is the analysis and conceptualisation of complex biological information.
What us BLOSUM62
is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. Pairwise alignment greater than 62%
Explain Affine Gap Penalties
Penalises insertions/ deletions, Penalty for gap openings, gap extensions, length of gap extensions. Gap openings have a higher cost.
What is In Silico
Ligand analysis performed on a computer
Explain BLAST
(basic local alignment search tool) is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. It uses Heuristic to speed us computation
What is Dynamic Programming
dynamic programming is solving complex problems by breaking them into states. It gives a score to find the optimal alignment. This process is very slow. The steps involve 1. initialisation 2. scoring the matrix 3. traceback
Protein vs DNA
Protein has 20 characters rather than 4. Codons are degeneratable. Offers a longer look back in time.
Paralogs
Duplication event
Why is DNA used?
To identify cDNA, non-coding regions of DNA and to identify DNA polymorphorisms.
Types of Algorithms?
- Uniformative
- Ungapped
- Gapped
Describe a hierarchical approach?
- Different groups are given a chromsome to sequence
- The hroups genereate a bacterial artifical chromosome (BAC)
- BAC is divided and shothun sequences
- High fideltiy maps identify motifs and allow detection of overlapping sequences.
How many Genes were found
51k
How many genes code
20k
How many genes non code
20k
What are pseudo genes
genes that seem to be protein coding but mutation renderers them non coding. 18k found
How many mRNA’s found? and what does this mean?
98k, for every gene, 5 mRNA are made.
Why are MSA done?
To elucidate functional in formation with proteins. Perform evolutionary analysis.
How are alignments scored?
- Maximum number of sequences are matched.
- Scoring is done with Sum of Pairs
- Each column is scored by summing all possible matches, gaps and mismatches.
What is the E-value?
The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.
What us the Affine Gap penality
Large opening gap penalty. smaller penalty for extending gap.
Neighbour Joining
Similar to UPGMA, corrects evolutionary rate. Created unrooted tree
What are some ways to create a tree?
- Distance Matrix Method
- Maximum parsimony method
- Maximum likelihood method
What is bootstrapping?
A way of statistically validating a tree
Data is resampled
How is MSA measured
It uses the ClustalW to form a phylogenetic tree. It uses the Sum of Pairs (Heuristic)
How does CLUSTAL work?
- Begin with a pairwise alignment
- Build a phylogenetic tree
- Take the most closely related sequence and align them
- Repeat with the next most closely related sequence
What is a characteristic of a fully resolved tree?
Only two branches on each node
Describe the distance matrix method?
Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference. 10. UPGMA. Stands for Unweighted pair group method with arithmetic mean.
Character based method
Use the aligned sequences directly during
tree inference.
Describe the shotgun approach
- DNA is isolated and chopped into fragments
- Fragments are cloned into vectors and sequenced
- Overlapping genes contribute to assemble the genome into contigs
- Scaffolds resemble contigs
Output of Sanger Sequencing
500-1k
Output of NGS
Billions
Sanger sequencing?
The target DNA is copied many times, making fragments of different lengths. Fluorescent “chain terminator” nucleotides mark the ends of the fragments and allow the sequence to be determined.
What determines the conformations of a protein active site
The conformations of side chains of the proteins non-active site regions
Describe the difference between local and global alignment.
global and local. The global approach compares one
whole sequence with other entire sequences. The local method uses a subset of a sequence and
attempts to align it to subset of other sequences. Local alignments reveal regions that are highly similar, but do not
necessarily provide a comparison across the entire two sequences.
Explain the steps of BLAST
- break the query into short words of a specific
length. - These words are then compared against a sequence in a database.
- those words whose T
value was greater than 18 were used as seeds to extend the alignment.
What does an E value of 1 represent
An e value of 1 means that one alignment using a query of this size will by chance produce a S score of this value in a database of this size.