Exam Questions Flashcards
A suffix array of a string T is a sorted array of all of its suffices.
Write down a suffix array of T=ACCTTGA.
Suffix Array: [1, 7, 2, 3, 6, 4, 5]
Show how to search for the signal string P1=ATG with the help of the suffix array. Is P1 in T?
Binary search on the suffix array for ATG:
Start with the whole array [7, 6, 4, 2, 5, 3, 1].
Compare ATG with the middle suffix (TGA$ at position 2).
Since ATG < TGA$, discard the right half.
Continue binary search until finding ATG$ at position 6.
Describe all steps needed for aligning two sequences with the Needleman-Wunschs algorithm
Initialization:
Create a matrix with dimensions (m+1) x (n+1).
Initialize the first row and column with gap penalties.
Scoring Scheme:
Define a scoring scheme for matches, mismatches, and gap penalties.
Filling the Matrix:
Fill the matrix by calculating scores for each cell based on neighboring cell scores.
Traceback:
Trace back through the matrix to determine the optimal alignment.
Start from the bottom-right corner (position (m,n)).
Move diagonally, upwards, or left based on the highest score.
Alignment Output:
Construct the aligned sequences based on the traceback path, adding gaps as needed.
Score Calculation:
Calculate the alignment score based on the chosen scoring scheme and traceback path.
Output:
Output the aligned sequences and alignment score.
What is the key property of a (simple) Markov process?
the probability of transitioning to a future state depends only on the current state and not on the sequence of events that preceded it. In other words, it is memoryless.
In an HMM, what is meant by “hidden” layer?
In an HMM (Hidden Markov Model), the “hidden” layer refers to the sequence of hidden states that are not directly observable. These states represent underlying or latent variables that influence the observed data.
In an HMM, what is meant by “observation”?
In an HMM, “observation” refers to the sequence of observable symbols or data points generated by the model. These observations are influenced by the underlying hidden states but are directly accessible or measurable.
In an HMM, which of the following are generated by a Markov process:
the sequence of states
the sequence of observations
both
none
In an HMM, both the sequence of states and the sequence of observations are generated by a Markov process. The sequence of states follows the Markov property as transitions between states depend only on the current state. Similarly, the sequence of observations is dependent on the hidden states, which follow the Markov property.
Please explain why sequence similarity searches (BLAST etc) are more sensitive for protein sequences than for DNA/RNA
- Degeneracy of Genetic Code:
- Conservation of Function Across Distantly Related Sequences:
- Structural Constraints and Evolutionary Pressure:
Sequence similrity,
structural similarity
functional similarity
Why can modern ‚neural network‘-based protein secondary structure prediction still be considered a ‚sliding window‘ method?
it operates by moving a fixed-size window along the protein sequence and predicting the secondary structure of the central residue based on the sequence information within that window. The neural network model takes as input the amino acid sequence within the window and outputs the predicted secondary structure of the central residue.
What other protein properties (besides 2ndary structure) can be addressed easily by sliding window methods? Name at least two more and estimate
appropriate window size ranges.
Solvent Accessibility: the degree to which AA are exposed to the solvent. window size range: 15-25
Transmembrane Helices: window size range 19-25 amino acids, to capture the characteristic periodicity of transmembrane helices.
Are sliding window methods useful for the prediction of tertiary structure? If yes, what window size range would be appropriate? If no, why not?
Sliding window methods are not particularly useful for the prediction of tertiary structure. Tertiary structure prediction involves predicting the spatial arrangement of amino acids in three dimensions, which requires considering long-range interactions between distant residues.
methods such as homology modeling, ab initio modeling, or molecular dynamics simulations are commonly used for tertiary structure prediction.
Homology Modeling vs Threading (Fold recognition) What are the fundamental differences between these two methods?
Homology modeling (sequence similarity)
Fold recognition (structural similarity) can identify distant structural similarities compared to homology modeling
Homology Modeling vs Threading (Fold recognition) Specify a situation, where one method cannot be applied while the other one can
One situation where threading can be applied while homology modeling cannot is when the query protein does not have significant sequence similarity to any protein with a solved structure in the PDB database.
Homology Modeling vs Threading (Fold recognition) What is the connection between the applicability of these methods and evolution?
Homology modeling relies on the assumption that proteins with similar sequences (i.e., close homologs) share similar structures due to their common evolutionary ancestry. The opposite is true for fold recognition
By what method(s) have the structures stored in PDB been determined?
X-ray Crystallography
Nuclear Magnetic Resonance (NMR) Spectroscopy
Alphafold