W3 L1 Information extraction Flashcards
why information extraction
we need it to get structure information (facts we want) from an unstructured/potentially messy input
ie where was einstein born
-> we need to extract this information from the many relevant documents
are terms like albert einstien a challenge for information extraction
yes, words that should be treated as a single unit are a challenge, some words are part of larger groups and should be grouped together
ie albert einstien is together not two seperate words
does the order of words matter in sentneces and information extraction
yes words occupy specific positions in a sentence for a reason and relate to each other in particular ways
the order of words matter, its not random
was einstine born albert where is garbage
what are parts of speech and pos tagging
parts of speech are nouns verbs adj and adverbs
part of speech tagging is identifying and tagging these parts of speech
why is pos tagging hard
there are many different intepretations for the same words
ie
we can fish
-> we go fishing
-> we put fish in cans
what is a markov chain
a model that defines the chance of a sequence of random variables/states happening
what is in a markov chain
states, transitions
give us markov chains
these markov chains have transition probabilities
the starting of these transition probabilities is the start distribution
what is the assumtion we are making in markov chain
they are memoryless
in order to predict the weather for tomorrow you only need to consider the weather today
what is a first oder markov chain
only the current state matters in determining the future
recall what these notations are
S, A, pi
S = s1…sN set of N states
A = a1….aNN transition probability matrix
the sum of the rows must equal 1
pi = pi1…piN initial probability distribution
the sum of pi = 1
how can we apply markov models to language
we use markov chains to calculate the probability of certain words appearing given the previous word
how can we use hidden markov models for pos tagging
if we recorded some observations that depended on the hidden events we can try to identify whta the hidden events were that were underneath, causing our surface observations
in pos tagging what would be the observations and the hidden states
observations = words in sentences
hidden states = part of speech
what elements does a hidden markov model have
states, transition matrix, initial probabiltiy distribution
AND
O = o1…oT sequence of T observations
B = bi(ot) sequence of observation likihoods (emission probabilities)