Lecture 3 Flashcards

N-gram Models, Morphology, Part-of-Speech, Word Senses

1
Q

Applications of Language Models

A

The goal of a Language Model is to assign a probability that a sentence (or phrase) will occur in natural uses of the language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Applications:

A

*Machine Translation: P(high winds tonight) > P(large winds tonight)
*Spell Correction
*The office is about fifteen minuets from my house: P(about fifteen minutes from) > P(about fifteen minuets from)
*Speech Recognition
*P(I saw a van)&raquo_space; P(eyes awe of an)
*Summarization, question-answering, and many other NLP applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chain Rules:

A

computing the joint probability of all the words conditioned by the previous words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Markov Assumption

A

we can predict the next word based on only one word previous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

N-gram Probabilities

A

the observed frequency (count) of the hole sequence divided by the observed frequency of the preceding, or initial, sequence (sometimes called the maximum likelihood estimation (MLE):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unigram Model: (word frequencies)

A

The simplest case is that we predict a sentence probability just based on the probabilities of the words with no preceding words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bigram Model:

A

(two word frequencies): prediction based on one previous word:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-gram Models

A

We can extend to trigrams, 4-grams, 5-grams
* Each higher number will get a more accurate model, but will be harder to
find examples of the longer word sequences in the corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

long-distance dependencies:

A

N-gram Models is an insufficient model of language because language has
long-distance dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bigrams:

A

any two words that occur together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bigram language models:

A

use the bigram probability, meaning a
conditional probability, to predict how likely it is that the second word follows the first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Smoothing

A

Every N-gram training matrix is sparse, even for very large corpora (remember Zipfs law)
* There are words that donʼt occur in the training corpus that may occur in future text - known as the unseen
words
* Whenever a probability is 0, it will multiply the entire sequence to be 0
* Solution: estimate the likelihood of unseen N-grams and include a small probability for unseen words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Levels of Language Analysis

A

1 Phonetic
2 Morphological
3 Lexical
4 Syntactic
5 Semantic
6 Discourse
7 Pragmatic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Speech Processing

A
  • Interpretation of speech sounds within & across words
  • sound waves are analyzed and encoded into a digitized signal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rules used in Phonological Analysis

A
  1. Phonetic rules – sounds within words
    e.g. When a vowel stands alone, the vowel is usually long
  2. Phonemic rules – variations of pronunciation when words are spoken
    together e.g. “r” in “part” vs. in “rose”
  3. Prosodic rules – fluctuation in stress and intonation across a sentence:
    rhythm, volume, pitch, tempo, and stress
    * e.g. High pitch vs. low pitch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Morphology: The Structure of Words

A

Morphology is the level of language that deals with the internal structure of
words
* General morphological theory applies to all languages as all natural human
languages have systematic ways of structuring words (even sign language)
* Must be distinguished from morphology of a specific language
* English words are structured differently from German words, although
both languages are historically related
* Both are vastly different from Arabic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Morpheme

A

A morpheme is a minimal subunit of meaning in a word We can usefully divide morphemes into two classes:
Stems
Affixes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stems:

A

The core meaning-bearing units (e.g., happy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Affixes:

A

Bits and pieces that adhere to stems to change their meanings and grammatical functions: prefixes, infixes, suffixes, circumfixes (e.g., unhappy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Inflection:

A

the combination of stems and affixes where the resulting word has the same
word type (e.g., noun, verb, etc.) as the original. Serves a grammatical purpose that is different from the original but is nevertheless transparently related to the original.
Examples: apple – noun; apples – still a noun

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Derivation

A

creates a new word by changing the category and/or meaning of the base to
which it applies. Can change the grammatical category (part of speech)
sing (verb) > singer (noun)
Derivation can change the meaning
act of singing > one who sings
Derivation is often limited to a certain group of words
You can Clintonize the government, but you canʼt Bushize the government (a phonological restriction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Use of Morphology in NLP Tasks -1
Stemming

A

Strip prefixes and / or suffixes to find the base root, which may or may not be an actual word
* Misspellings are inconsequential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Use of Morphology in NLP Tasks -2
Lemmatization

A

Strip prefixes and / or suffixes to find the base root, which will always be an actual word
* Often based on a word list, such as that
available at WordNet
* Correct spelling if crucial

24
Q

Use of Morphology in NLP Tasks -3
Part of speech prediction

A

Knowledge of morphemes for a particular
language can be a powerful aid in guessing
the part of speech for an unknown term

25
Q

To Stem (Lemma) or Not to Stem (Lemma)

A

The decisions to stem, lemmatize, remove stop words, and normalize (lowercase) depend on the amount of input documents and the analytic approach. More documents = less need for data reduction.

26
Q

Part-of-speech Tagging:

A

Assigning Correct Word Types to Words in the Text
The general purpose of a part-of-speech tagger is to associate each word in a text with its correct lexical-syntactic category (represented by a tag)

27
Q

Varying terminology:

A

Parts-of-speech (POS), lexical categories, word classes, morphological classes, lexical tags… Lots of debate within linguistics about the number, nature, and universality of these – AND we’ll completely
ignore this debate

28
Q

Penn Treebank Tag Set -
NNS

A

noun, plural

29
Q

Penn Treebank Tag Set -
NNP

A

proper noun, singular

30
Q

Penn Treebank Tag Set -
NNPS

A

proper noun plural

31
Q

Penn Treebank Tag Set -
PDT

A

predeterminer

32
Q

Penn Treebank Tag Set -
POS

A

possessive ending

33
Q

Penn Treebank Tag Set -
PRP

A

personal pronoun

34
Q

Penn Treebank Tag Set -
PRP$

A

possessive pronoun

35
Q

Penn Treebank Tag Set -
RB

A

adverb

36
Q

Penn Treebank Tag Set -
RBR

A

adverb, comparative

37
Q

Penn Treebank Tag Set -
RBS

A

adverb, superlative

38
Q

Penn Treebank Tag Set -
RP

A

particle

39
Q

Penn Treebank Tag Set -
WRB

A

wh- adverb

40
Q

POS Tagging Approaches:
Rule-based Approach

A

Simple and doesn’t require a tagged corpus, but not as accurate as other
approaches.

41
Q

POS Tagging Approaches:
Stochastic Approaches

A
  • Refers to any approach which incorporates frequencies or probabilities
  • Requires a tagged corpus to learn frequencies of words with POS tags
  • N-gram taggers: uses the context of (a few) previous tags
  • Hidden Markov Model (HMM) taggers: uses the context of the entire
    sequence of words and previous tags
  • This technique has been the most widely used of modern taggers, but has
    the problem of unknown words
42
Q

POS Tagging Approaches:
Classification Taggers

A
  • Uses morphology of word and (a few) surrounding words
  • Helps solve the problem of unknown words
43
Q

Computing the Two Probabilities:
Word likelihood probabilities

A
  • VBZ (third-person singular present
    verb): likely to be “is”
  • Compute P(is|VBZ) by counting in a
    labeled corpus
44
Q

Computing the Two Probabilities:
Tag transition (prior) probabilities

A
  • Determiners likely to precede adjectives and nouns
  • That/DT flight/NN
  • The/DT yellow/JJ hat/NN
  • So we expect P(NN|DT) and P(JJ|DT) to
    be high
  • Compute P(NN|DT) by counting in a labeled corpus
45
Q

Word Sense:

A

We say that a word has more than one word sense (meaning) if there is more than one definition.

46
Q

Word senses may be:

A
  • Coarse-grained, if not many distinctions are made
  • Fine-grained, if there are many distinctions of meanings
47
Q

Lexical Semantics
Lexicons –

A

list of words (or lexemes or stems) with basic info

48
Q

Lexical Semantics
Dictionaries –

A

a lexicon with definitions for each word sense * Most are now available online

49
Q

Lexical Semantics
Thesauruses –

A

add synonyms/ antonym for each word sense * WordNet

50
Q

Lexical Semantics
Semantic networks –

A

add more semantic relations, including semantic
categories
* WordNet, EuroWordNet

51
Q

Lexical Semantics
Ontologies –

A

add rules about entities, concepts and relations, semantic
categories
* UMLS

52
Q

Lexical Semantics
Semantic Lexicon –

A

Lexicon where each word is assigned to a semantic class
* LIWC, ANEW, Subjectivity Lexicon

53
Q

WordNet – A Hand-Curated Word Database

A

WordNet is a database of facts
about words
* Meanings and the relations
among them
* Words are organized into
clusters of synonyms
* Synsets
Organized into nouns, verbs,
adjectives, and adverbs
* Currently 170,000 synsets
* Available for download, arranged
in separate files (DBs)

54
Q

Hierarchical Semantic Representations

A

A semantic network provides
relations for each word sense:
* hypernymy/hyponymy (IS-A),
* hypernyms are more general,
hyponyms are more specific
* meronymy/holonymy (PART-OF),

55
Q
A