Exam Questions 2 Flashcards
“Sieve of Eratosthenes”:
what is the goal of this algorithm?
To find all prime numbers within a given range, typically up to a specified maximum limit.
“Sieve of Eratosthenes”:
Pseudo-code how does it work
- Start with a list of numbers from 2 to the maximum limit
- Begin with the first number in the list, which is 2, and mark it as prime
- Cross out all multiples of this prime number in the list as non prime
- Go to the next number and If the number is not marked as prime/ crossed out it is a prime number.
- repeat 3 and 4 until square root of the limit
“Sieve of Eratosthenes”:
what is its (time-)complexity? Shortly explain.
O(n log log n),
“n” is the maximum limit up to which you want to find prime numbers.
Given a eukaryotic genome with a GC-content of 40%, how long are open reading frames (ORFs) on average?
three stop codons: UAA, UAG, and UGA
Probability of A or T (P(AT)) = 0.5
Probability of G or C (P(GC)) = 0.5
P(stop) = (0.5)^2 * 0.5 = 0.125
This means that in a random sequence, you would expect a stop codon approximately every 1 / 0.125 = 8 codons.
Do you expect the same average ORF lengths on the forward and the backward strands? Shortly explain.
No because genes are typically found on the coding strand and not on the non coding strand, hence the coding strand tends to have longer ORFs than the non coding strand
Low gene expression can be detected through pairwise genome alignments
False. Pairwise genome alignments are primarily used for comparing genomic sequences to identify similarities, differences, and structural variations.
Typical: RNA-seq
What does UPGMA stand for?
UPGMA stands for “Unweighted Pair Group Method with Arithmetic Mean.”
A distance matrix for n sequences contains n(n-1)/2 entries, when entries on the diagonal are not counted
True
Any distance matrix uniquely determines exactly one phylogenetic tree.
False
Difference between TPM and FPKM
FPKM normalizes for both the library size and the length of the gene.
TPM only normalizes for library size (the total number of reads or fragments in the library) but does not consider gene length.
To specify a Hidden Markov Model with n<∞ states, one needs
0 initial probabilities
0 substitution probabilities
0 transition probabilities
0 emission probabilities
0 exit probabilities
0 likelihood ratios
Transition probabilities
What do you need to consider when choosing a window with for a sliding window method?
feature charactersitics: well defined feature -> narrow window and vice versa
noise ratio: large window smoothes noise
computational ressources:
Which of these motif descriptors (regular expression, weight matrix, Sequence Logo) is/are suitable for describing a splice site consensus?
Weight Matrix:
Briefly summarize the problem of the non-suitable descriptor(s) (regular expression, Sequence Logo)
Regular expression: May not capture variability of splice site consensus
Sequence Logo: Used for visualizing sequence motifs and do not provide probabilities for splice site consensus
Difference between Smith Waterman Alignment and structural alignment?
Smith Waterman: Sequence Alignment, sequence similarity in local regions,
Strutural Alignment: considers 3D structure, allowing for the identification of conserved structural motifs and functional implications