w3 POS tagging Flashcards
what is information extraction IE
extracting information from documents based on a unstructured/messy input
ie search on google where was einstien born
what are the challenges in information extraction
words ocupy specific positions relative to each other, order of the words matter so its changing one term in the inquiry can change the search results
also words sometimes are meant to be in groups
ie albert einstien
but computer will see that as two words
what is pos tagging
breaking down sentences into parts of speech, tagging words as nouns verbs adj, etc
why is pos tagging hard
there are a lot of interpretations of sentences
nouns and verbs can change spending on intention
we can fish = we know how to pish
we can fish = we put fish in cans
what is a markov chain?
a model that defines the chance of sequences of random variables (states), these states are predefined
what is a transition probability in a markov chain
how likey a state is to transition to another state
what is a start distubution
the chance of starting at a certain state
what are the components of the markov chain
states (S), trainsitions (A) which give us markov chains
these chains have transition probabilities
and the starting probability spread is the start distribution (symbol is pi)
what is a powerful point about markov chains
they are memoryless,
current probabilities are not impacted by things that happened in the past
is pos real
i dont think so, scientists randomly assigned and categorized these words but that means the concept of pos tags is made up
its a theoretical construct
in a markov chain
when it says
a1 = [0.6 0.1 0.3] what does that mean
this means
the probablity of going between diff states ie
P(s1 -> s1) = 0.6
P(s1 -> s2) = 0.1
P(s1 -> s3) = 0.3
the chance of state 1 changing to state 2 is 10%
what are different order markkov chains
the order of a markov chain tells you how many past expereinces it uses to predict a future
ie a first order markov chain only uses the current state to make predictions
p(s=a|s1…..s(i-1)) = P(s = a |s (i-1)
what is the ∑j Aij = 1 ∀i
all of the changes to my state will summ up to one
what does it mean when the summation over all the inital probabilities doesnt equal one
while it usually equals one, if it doesnt that means theres something there that you havent captured
if there is a zero transition probability between two words what does that mean
there is no chance the words will show up next to each other
how would u calcuate the probabiltiy of
s1 -> s2 -> s3 -> s2
given markov chain/ transiton probabilities
If the transition matrix is:
P = 0.1 0.4 0.5
0.2 0.3 0.5
0.3 0.7 0.0
s1 = 1 s2 = 2 s3 = 3
then
0.4⋅0.5⋅0.7=0.14
because
P(s1 -> s2) = 0.4 * ….. etc etc
how do hidden markov models relate to pos tagging
you are going to have some inner/hidden transitions assoicated with the pos
you are itnerested in deetecting the underlying sequence of hidden states which is what u will assign as ur pos
what is the goal of hmm and pos
Given some input observable sequence of words x one to x n and some tag set, which is basically what this part of speech things are,
you want to come up with some output sequence where y I corresponds to the part of speech associated with x I.
the words are the obserable observation and
the parts of speech tags are hidden
what are observations and emission probabilties
O = o1, o2 … oN a sequence of N observatiosn
B = bi * oN a sequence of N observation liklihoods/emission probabiltiies
the chance of an observation popping up bc of a state
what is conditional probabiltiy formula
If I have
3 green shirts, of which I like 2 and
8 blue shirts, of which I like 6
What is the probability I like the shirt I am wearing given that I
am wearing a blue shirt?
P(A|B)
6/8
of blue shirts i like/ total # of blue shirts