w3 part 2 information extraction vertabi algorithm Flashcards

Question 1

Q

what is the output independence assumption

Answer

A

the output independence assumption states the probability of an output observation depends only on the state that produced the observation not past or future observations

Question 2

Q

problem on that one paper: to determine the squence of hidden variables corresponding to the sequence of observations

find the most probably sequence of states

derive this

prove that argmaxP(t1…|W…) = argmax P(w|T)P(ti(t-i))

Answer

A

practice deriving this from the paper

Question 3

Q

what is bayes theorem

Answer

A

P(A|B) = P(B|A)*P(A)/ P(B)

Question 4

Q

suppose tag DET occures 1000 times and 850 times its followed by noun what is P(NOUN |DET)

Answer

A

P(NOUN |DET) = count(noun |det)/ count(det) = 850/1000 = 0.85

Question 5

Q

suppose tag NOUN occures 1000 times and 50 cases it is represented by word ‘bill’ what is P(‘bill’ | NOUN)

Answer

A

P(‘bill’ | NOUN) = count(NOUN, ‘bill’)/ count(NOUN) = 50/1000 = 0.05

Question 6

Q

can you brute force pos tagging?

Answer

A

yes but it quickly gets expensive

ie with 3 observations and 6 states you have 216 combinations already

Question 7

Q

what is the time complexity of the viterbi algorithm vs brute force

Answer

A

brute force = O(P^L)

viterbi = O(L x P^2)

where p is pos
and L is length