Exam Questions 2 Flashcards

Question 1

Q

“Sieve of Eratosthenes”:
what is the goal of this algorithm?

Answer

A

To find all prime numbers within a given range, typically up to a specified maximum limit.

Question 2

Q

“Sieve of Eratosthenes”:
Pseudo-code how does it work

Answer

A

Start with a list of numbers from 2 to the maximum limit
Begin with the first number in the list, which is 2, and mark it as prime
Cross out all multiples of this prime number in the list as non prime
Go to the next number and If the number is not marked as prime/ crossed out it is a prime number.
repeat 3 and 4 until square root of the limit

Question 3

Q

“Sieve of Eratosthenes”:
what is its (time-)complexity? Shortly explain.

Answer

A

O(n log log n),
“n” is the maximum limit up to which you want to find prime numbers.

Question 4

Q

Given a eukaryotic genome with a GC-content of 40%, how long are open reading frames (ORFs) on average?

Answer

A

three stop codons: UAA, UAG, and UGA
Probability of A or T (P(AT)) = 0.5
Probability of G or C (P(GC)) = 0.5
P(stop) = (0.5)^2 * 0.5 = 0.125
This means that in a random sequence, you would expect a stop codon approximately every 1 / 0.125 = 8 codons.

Question 5

Q

Do you expect the same average ORF lengths on the forward and the backward strands? Shortly explain.

Answer

A

No because genes are typically found on the coding strand and not on the non coding strand, hence the coding strand tends to have longer ORFs than the non coding strand

Question 6

Q

Low gene expression can be detected through pairwise genome alignments

Answer

A

False. Pairwise genome alignments are primarily used for comparing genomic sequences to identify similarities, differences, and structural variations.
Typical: RNA-seq

Question 7

Q

What does UPGMA stand for?

Answer

A

UPGMA stands for “Unweighted Pair Group Method with Arithmetic Mean.”

Question 8

Q

A distance matrix for n sequences contains n(n-1)/2 entries, when entries on the diagonal are not counted

Question 9

Q

Any distance matrix uniquely determines exactly one phylogenetic tree.

Question 10

Q

Difference between TPM and FPKM

Answer

A

FPKM normalizes for both the library size and the length of the gene.
TPM only normalizes for library size (the total number of reads or fragments in the library) but does not consider gene length.

Question 11

Q

To specify a Hidden Markov Model with n<∞ states, one needs
0 initial probabilities
0 substitution probabilities
0 transition probabilities
0 emission probabilities
0 exit probabilities
0 likelihood ratios

Answer

A

Transition probabilities

Question 12

Q

What do you need to consider when choosing a window with for a sliding window method?

Answer

A

feature charactersitics: well defined feature -> narrow window and vice versa
noise ratio: large window smoothes noise
computational ressources:

Question 13

Q

Which of these motif descriptors (regular expression, weight matrix, Sequence Logo) is/are suitable for describing a splice site consensus?

Answer

A

Weight Matrix:

Question 14

Q

Briefly summarize the problem of the non-suitable descriptor(s) (regular expression, Sequence Logo)

Answer

A

Regular expression: May not capture variability of splice site consensus
Sequence Logo: Used for visualizing sequence motifs and do not provide probabilities for splice site consensus

Question 15

Q

Difference between Smith Waterman Alignment and structural alignment?

Answer

A

Smith Waterman: Sequence Alignment, sequence similarity in local regions,
Strutural Alignment: considers 3D structure, allowing for the identification of conserved structural motifs and functional implications

Question 16

Q

You want to find the statistical significance of a possible motif enrichment found in a gene set. You know the size of the set (number of genes) and also how many of them have this motif.
a) What else do you need?

Answer

Study These Flashcards

A

Background Distribution against which you can measure the significance of the motif enrichment

Question 17

Q

You want to find the statistical significance of a possible motif enrichment found in a gene set. You know the size of the set (number of genes) and also how many of them have this motif.
What statistical test would you perform?