#5 Flashcards
Part of Speech (PoS)
clusters of words which behave in a similar way
example of PoS
happy + ness = happiness
open class words
new words are created all the time and change fast
example of class words
noun, adj, verb, adverbs
closed class words
the set is pretty much fixed
example of closed words
conjunctions, determiners, pronouns, preposition
open class words are productive/not productive
productive
cross-linguistic validity
differences are more pronounced for closed class words
the Penn Treebank tagset
does not include syntactic information
PoS tagging
the process of automatically assiging PoS tags to words in a corpus
What does tagging
observe a sequence of words and find the best sequence of tags
what’s the input of tagging
tokenized corpus
what’s the output of tagging
a sequence of tags for each input token
What is the aim of hidden markov model (HMM)
to model and analyze sequential data as text, speech, handwritting
Main difference between the observation in PoS tagging and LM
in PoS we don’t observe the states we want to predict as we do in LM