Algos Problem 2: Finding Regulatory Motifs in DNA Flashcards

Question 1

Q

How many k-mers exist for DNA?

Answer

A

A, T, C, G ….. 4 Nukleotiden

–> 4^k k-mers

Question 2

Q

What is the consensus of the following motifs?

T C G G G G g T T T t t
c C G G t G A c T T a C
a C G G G G A T T T t C
T t G G G G A c T T t t
a a G G G G A c T T C C
T t G G G G A c T T C C
T C G G G G A T T c a t
T C G G G G A T T c C t
T a G G G G A a c T a C
T C G G G t A T a a C C

Answer

A

Consensus(Motifs) T C G G G G A T T T C C

Question 3

Q

What is the score of the following motifs?

T C G G G G g T T T t t
c C G G t G A c T T a C
a C G G G G A T T T t C
T t G G G G A c T T t t
a a G G G G A c T T C C
T t G G G G A c T T C C
T C G G G G A T T c a t
T C G G G G A T T c C t
T a G G G G A a c T a C
T C G G G t A T a a C C

Answer

A

Score(Motifs) 3 + 4 + 0 + 0 + 1 + 1 + 1+ 5 + 2 + 3 + 6 + 4 = 30

Question 4

Q

Define the score of a set of motifs.

Answer

A

Score(Motifs): # of lowercase letters in motif matrix

Question 5

Q

Brute Force Motif Search: Compute score of every possible choice of k-mers in DNA.

STOP and Think: What is this approach’s runtime?

Answer

A

Say that DNA consists of t strings of length n:

There are (n – k + 1)^t different choices of Motifs
Scoring the matrix requires k*t steps.

Overall runtime ~ O(nt * k * t) – too slow!

Question 6

Q

Hamming distance between a sequence and Consensus(Motifs) ?

Answer

A

Number of lowercase symbols in a row

Question 7

Q

Motif Finding Problem Runtime: n^t ∙k∙t

versus

Median String Problem Runtime: 4^k∙n∙t∙k

which is faster?

Answer

A

4^k∙n∙t∙k is faster

Although MedianString is much faster than BruteForceMotifSearch, it is
still slow for large k.

Question 8

Q

From Motifs to Profile?

eg

T C G G G G g T T T t t
c C G G t G A c T T a C
a C G G G G A T T T t C
T t G G G G A c T T t t
a a G G G G A c T T C C
T t G G G G A c T T C C
T C G G G G A T T c a t
T C G G G G A T T c C t
T a G G G G A a c T a C
T C G G G T A T a a C C

Answer

A

frequency of nucleotidei in column j

eg

A: .2 .2 0 0 0 0 .9 .1 .1 .1 .3 0
C: .1 .6 0 0 0 0 0 .4 .1 .2 .4 .6
G: 0 0 1 1 .9 .9 .1 0 0 0 0 0
T: .7 .2 0 0 .1 .1 0 .5 .8 .7 .3 .4

Each column of a profile represents a biased 4-sided die with A, C,
G, and T on its sides. Thus, a profile corresponds to k dice.

Question 9

Q

Given the following Profile:

A 1/2 7/8 3/8 0 1/8 0
C 1/8 0 1/2 5/8 3/8 0
T 1/8 1/8 0 0 1/4 7/8
G 1/4 0 1/8 3/8 1/4 1/8

The probability of the consensus string:
Pr(AAACCT|Profile) = ???

Answer

A

The probability of the consensus string:

Pr(AAACCT|Profile) = 1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8 = 0.033646

Question 10

Q

The closer k-mer is to the consensus string, the _____ the Pr(k-mer| Profile) is.

Question 11

Q

Profile-most probable k-mer in a
sequence

Answer

A

the k-mer with the highest Pr(k-mer | Profile) among all k-mers in this sequence.

Question 12

Q

shorthand for probability of a sequence according to a matrix

Answer

A

P(sequence|matrix) =

Question 13

Q

bei Motifsuche:
Welche Position hat die höhere (Shannon) Entropie? (Positionen fangen mit der Zahl 1 an.) Wann ist die Entropie am höchsten?

Answer

A

Position ____ hat die hoehere Entropie, da die Eintraege naeher an einer Gleichverteilung sind (alternativ: mehrere Eintraege fast gleich wahrscheinlich sind).

Die Entropie ist für eine Gleichverteilung am hoechsten.

Question 14

Q

bei Motifsuche:
von ein Profilmatrix bekommt man fuer ein Sequenz die Wahrscheinlichkeit 0, weil an einer Stelle 0 ist, obwohl die andere Stellen hoeher Wahrscheinlichkeiten haben. Wie laesst sich diese Problem umgehen?

Answer

A

Obwohl die Wahrscheinlichkeit an den mesiten Stellen hoch ist, wird die gesamt wahrscheinlichkeit 0, weil an eine Stelle Null ist.
Pseudocounts werden hinzugefuegt nach Laplace, um eine Wahrscheinlichkeit von Null zu vermeiden

Algos Problem 2: Finding Regulatory Motifs in DNA Flashcards

(14 cards)