POS Flashcards
What does POS stand for?
Parts of Speech
How did POS originate?
It originated around 100BC when Dionysius Thrax of Alexandria was attempting to summarise Greek linguistic knowledge
What can POS also be called?
Word classes, morphological classes, lexical tags
What are POS generally assigned to?
Individual words or morphemes
What is labelling POS known as?
POS Tagging
What are proper names?
A proper name is called a Named Entity, and can be a multi-word phrase. Labelling named entities is called Named Entity Recognition (NER)
Why are POS and NEs useful?
POS gives us clues to neighbouring words and syntactic structure. POS tagging is a key aspect of parsing natural language. NER is important to many natural language tasks such as question answering, stance detection and information extraction
What type of task is POS?
It is a sequence labelling task
What is the input, output and their lengths for POS labelling?
The input X is a sequence of words, the output Y is a sequence of POS tags and the length of X and Y are equal
What is a closed class?
These are classes that have fixed membership. These are typically function words used for structuring grammar (of, it, and, you)
What is an open class?
These have open membership. These are typically nouns, verbs, adjective, adverb, interjections. Language changes over time, new vocabulary emerges and words may take new meanings
What is a list of POS labels called?
It is called a tagset
What are some popular tagsets?
Penn Treebank
Brown Corpus
C7 Tagset
What is POS tagging?
It is the process of assigning a POS tag to each word in a text
Why is POS Tagging a disambiguation task?
Words are ambiguous and can have more than different POS, so we need to find the correct tag for the given situation
What are some ways you can build a POS tagger?
Rule-based taggers (hand crafted disambiguation rules)
Transformation-based taggers (supervised learning of tagging rules + some hand crafted templates)
Hidden Markov Models (HMM)
Conditional Random Fields (CRF)
What is a Markov chain?
It models the probability of a next state given the current state
What assumption is held with the Markov chain?
That the future depends only on the current state (not past)
What does the image show?
It shows the Markov assumption, where the future depends only on the current state
What is a state in a Markov Model?
It is a word, and we have a sequence of state variables, so a sequence of words
Explain what is shown in the image
It shows a Markov chain. In Figure b, we can say that the probability of the word are, given the word uniformly, is 0.4.
P (are | uniformly) = 0.4
Why is a basic Markov Model different to a Hidden Markov Model?
A basic model requires all events (words and tags) to have been observed
Explain what the image shows.
It shows in a Markov model we have a set Q, that has is a set of N states
We have a transition probability matrix, A, which represents the probability of moving from state i to state j
We have the initial probability distribution, which gives the probability of a state occurring at the start, as we will start from nothing
What is the idea of the Hidden Markov Model?
POS tags are hidden states, which we must infer from the observed words.
States are now POS tags, and words represent observations, which we can see
What is the Markov Assumption applied to POS?
That the next tag (state) depends only on the current tag (state)
What is the output independence assumption?
It is that the probability of the observation (word) depends only on the current state (tag sequence)
What does the image show?
It shows in a HMM, we have a set of states, Q
We have a transition probability matrix, A, which is the probability of the next tag, given the current tag
Matrix B is a matrix of observation likelihoods, which is the probability of a word (observation) given a state (tag sequence)
We still have the initial probability distribution and we also have a sequence of observations, which are words
What does decoding do in POS Tagging?
Given a model and a sequence of observations, decoding aims to find the most probable sequence of states
When decoding, what are some assumptions we make?
The probability of a word depends only on tag (independent on neighbours)
Probability of tag depends only on previous tag (bigram)
What is the Viterbi Algorithm?
It is an efficient way for HMM decoding and is done using dynamic programming
What is the full equation for a HMM?
The image shows the equation. The emission is the probability of a word given a state (POS tag), and the transition is the probability of a POS tag given the previous POS tag (or states)