Lecture 3 Flashcards

N-gram Models, Morphology, Part-of-Speech, Word Senses

1
Q

Applications of Language Models

A

The goal of a Language Model is to assign a probability that a sentence (or phrase) will occur in natural uses of the language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Applications:

A

*Machine Translation: P(high winds tonight) > P(large winds tonight)
*Spell Correction
*The office is about fifteen minuets from my house: P(about fifteen minutes from) > P(about fifteen minuets from)
*Speech Recognition
*P(I saw a van)&raquo_space; P(eyes awe of an)
*Summarization, question-answering, and many other NLP applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chain Rules:

A

computing the joint probability of all the words conditioned by the previous words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Markov Assumption

A

we can predict the next word based on only one word previous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

N-gram Probabilities

A

the observed frequency (count) of the hole sequence divided by the observed frequency of the preceding, or initial, sequence (sometimes called the maximum likelihood estimation (MLE):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unigram Model: (word frequencies)

A

The simplest case is that we predict a sentence probability just based on the probabilities of the words with no preceding words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bigram Model:

A

(two word frequencies): prediction based on one previous word:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-gram Models

A

We can extend to trigrams, 4-grams, 5-grams
* Each higher number will get a more accurate model, but will be harder to
find examples of the longer word sequences in the corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

long-distance dependencies:

A

N-gram Models is an insufficient model of language because language has
long-distance dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bigrams:

A

any two words that occur together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bigram language models:

A

use the bigram probability, meaning a
conditional probability, to predict how likely it is that the second word follows the first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Smoothing

A

Every N-gram training matrix is sparse, even for very large corpora (remember Zipfs law)
* There are words that donʼt occur in the training corpus that may occur in future text - known as the unseen
words
* Whenever a probability is 0, it will multiply the entire sequence to be 0
* Solution: estimate the likelihood of unseen N-grams and include a small probability for unseen words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Levels of Language Analysis

A

1 Phonetic
2 Morphological
3 Lexical
4 Syntactic
5 Semantic
6 Discourse
7 Pragmatic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Speech Processing

A
  • Interpretation of speech sounds within & across words
  • sound waves are analyzed and encoded into a digitized signal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rules used in Phonological Analysis

A
  1. Phonetic rules – sounds within words
    e.g. When a vowel stands alone, the vowel is usually long
  2. Phonemic rules – variations of pronunciation when words are spoken
    together e.g. “r” in “part” vs. in “rose”
  3. Prosodic rules – fluctuation in stress and intonation across a sentence:
    rhythm, volume, pitch, tempo, and stress
    * e.g. High pitch vs. low pitch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Morphology: The Structure of Words

A

Morphology is the level of language that deals with the internal structure of
words
* General morphological theory applies to all languages as all natural human
languages have systematic ways of structuring words (even sign language)
* Must be distinguished from morphology of a specific language
* English words are structured differently from German words, although
both languages are historically related
* Both are vastly different from Arabic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Morpheme

A

A morpheme is a minimal subunit of meaning in a word We can usefully divide morphemes into two classes:
Stems
Affixes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stems:

A

The core meaning-bearing units (e.g., happy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Affixes:

A

Bits and pieces that adhere to stems to change their meanings and grammatical functions: prefixes, infixes, suffixes, circumfixes (e.g., unhappy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Inflection:

A

the combination of stems and affixes where the resulting word has the same
word type (e.g., noun, verb, etc.) as the original. Serves a grammatical purpose that is different from the original but is nevertheless transparently related to the original.
Examples: apple – noun; apples – still a noun

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Derivation

A

creates a new word by changing the category and/or meaning of the base to
which it applies. Can change the grammatical category (part of speech)
sing (verb) > singer (noun)
Derivation can change the meaning
act of singing > one who sings
Derivation is often limited to a certain group of words
You can Clintonize the government, but you canʼt Bushize the government (a phonological restriction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Use of Morphology in NLP Tasks -1
Stemming

A

Strip prefixes and / or suffixes to find the base root, which may or may not be an actual word
* Misspellings are inconsequential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Use of Morphology in NLP Tasks -2
Lemmatization

A

Strip prefixes and / or suffixes to find the base root, which will always be an actual word
* Often based on a word list, such as that
available at WordNet
* Correct spelling if crucial

24
Q

Use of Morphology in NLP Tasks -3
Part of speech prediction

A

Knowledge of morphemes for a particular
language can be a powerful aid in guessing
the part of speech for an unknown term

25
To Stem (Lemma) or Not to Stem (Lemma)
The decisions to stem, lemmatize, remove stop words, and normalize (lowercase) depend on the amount of input documents and the analytic approach. More documents = less need for data reduction.
26
Part-of-speech Tagging:
Assigning Correct Word Types to Words in the Text The general purpose of a part-of-speech tagger is to associate each word in a text with its correct lexical-syntactic category (represented by a tag)
27
Varying terminology:
Parts-of-speech (POS), lexical categories, word classes, morphological classes, lexical tags... Lots of debate within linguistics about the number, nature, and universality of these – AND we’ll completely ignore this debate
28
Penn Treebank Tag Set - NNS
noun, plural
29
Penn Treebank Tag Set - NNP
proper noun, singular
30
Penn Treebank Tag Set - NNPS
proper noun plural
31
Penn Treebank Tag Set - PDT
predeterminer
32
Penn Treebank Tag Set - POS
possessive ending
33
Penn Treebank Tag Set - PRP
personal pronoun
34
Penn Treebank Tag Set - PRP$
possessive pronoun
35
Penn Treebank Tag Set - RB
adverb
36
Penn Treebank Tag Set - RBR
adverb, comparative
37
Penn Treebank Tag Set - RBS
adverb, superlative
38
Penn Treebank Tag Set - RP
particle
39
Penn Treebank Tag Set - WRB
wh- adverb
40
POS Tagging Approaches: Rule-based Approach
Simple and doesn’t require a tagged corpus, but not as accurate as other approaches.
41
POS Tagging Approaches: Stochastic Approaches
* Refers to any approach which incorporates frequencies or probabilities * Requires a tagged corpus to learn frequencies of words with POS tags * N-gram taggers: uses the context of (a few) previous tags * Hidden Markov Model (HMM) taggers: uses the context of the entire sequence of words and previous tags * This technique has been the most widely used of modern taggers, but has the problem of unknown words
42
POS Tagging Approaches: Classification Taggers
* Uses morphology of word and (a few) surrounding words * Helps solve the problem of unknown words
43
Computing the Two Probabilities: Word likelihood probabilities
* VBZ (third-person singular present verb): likely to be “is” * Compute P(is|VBZ) by counting in a labeled corpus
44
Computing the Two Probabilities: Tag transition (prior) probabilities
* Determiners likely to precede adjectives and nouns * That/DT flight/NN * The/DT yellow/JJ hat/NN * So we expect P(NN|DT) and P(JJ|DT) to be high * Compute P(NN|DT) by counting in a labeled corpus
45
Word Sense:
We say that a word has more than one word sense (meaning) if there is more than one definition.
46
Word senses may be:
* Coarse-grained, if not many distinctions are made * Fine-grained, if there are many distinctions of meanings
47
Lexical Semantics Lexicons –
list of words (or lexemes or stems) with basic info
48
Lexical Semantics Dictionaries –
a lexicon with definitions for each word sense * Most are now available online
49
Lexical Semantics Thesauruses –
add synonyms/ antonym for each word sense * WordNet
50
Lexical Semantics Semantic networks –
add more semantic relations, including semantic categories * WordNet, EuroWordNet
51
Lexical Semantics Ontologies –
add rules about entities, concepts and relations, semantic categories * UMLS
52
Lexical Semantics Semantic Lexicon –
Lexicon where each word is assigned to a semantic class * LIWC, ANEW, Subjectivity Lexicon
53
WordNet – A Hand-Curated Word Database
WordNet is a database of facts about words * Meanings and the relations among them * Words are organized into clusters of synonyms * Synsets Organized into nouns, verbs, adjectives, and adverbs * Currently 170,000 synsets * Available for download, arranged in separate files (DBs)
54
Hierarchical Semantic Representations
A semantic network provides relations for each word sense: * hypernymy/hyponymy (IS-A), * hypernyms are more general, hyponyms are more specific * meronymy/holonymy (PART-OF),
55