# 3 Flashcards
token
every word in a corpus ( punctuation as well)
ex. Are you on campus today? # 5 tokens (6 with ?)
type
every distinct word in a corpus
ex. I will eat what you eat, even if I am not hungry.
# 10 ( eat and eat are one type, I’s as well)
lemma
words with the same root
ex. I have never eaten that before, but I will eat it.
# 9 (eat and eaten has the same lemma)
lemmatization
the process of reducing each word-form in a corpus
is -> be
went -> go
best -> good
Stem
the base form of a word that carries out its meaning
ex. talked -> talk
what is the stem of reading?
read
affix
a morpheme added to the a stem to modify its meaning
work + ed = worked
what is the affix of unhappiness
stem: happy
affixes: “un”, “ness”
derivational affixes
create new words by changing grammatical category or meaning
happy + un[not] = unhappy
Inflectional affixes
encode numbers, tenses, aspects of a word
talk + ed [past] = talked
combining stems together to produce new meaning?
compounds
example of a compound?
fire -> fighter
what entails normalization
discarding variability
Disjunction of good and Good
/[gG]ood/