- Interpretation of speech sounds within & across words - sound waves are analyzed and encoded into a digitized signal

Lecture 3 Flashcards by Edward Cogan

Applications of Language Models

The goal of a Language Model is to assign a probability that a sentence (or phrase) will occur in natural uses of the language

How well did you know this?

Not at all

Perfectly

Applications:

*Machine Translation: P(high winds tonight) > P(large winds tonight)
*Spell Correction
*The office is about fifteen minuets from my house: P(about fifteen minutes from) > P(about fifteen minuets from)
*Speech Recognition
*P(I saw a van)&raquo_space; P(eyes awe of an)
*Summarization, question-answering, and many other NLP applications

How well did you know this?

Not at all

Perfectly

Chain Rules:

computing the joint probability of all the words conditioned by the previous words

How well did you know this?

Not at all

Perfectly

Markov Assumption

we can predict the next word based on only one word previous

How well did you know this?

Not at all

Perfectly

N-gram Probabilities

the observed frequency (count) of the hole sequence divided by the observed frequency of the preceding, or initial, sequence (sometimes called the maximum likelihood estimation (MLE):

How well did you know this?

Not at all

Perfectly

Unigram Model: (word frequencies)

The simplest case is that we predict a sentence probability just based on the probabilities of the words with no preceding words

How well did you know this?

Not at all

Perfectly

Bigram Model:

(two word frequencies): prediction based on one previous word:

How well did you know this?

Not at all

Perfectly

N-gram Models

We can extend to trigrams, 4-grams, 5-grams
* Each higher number will get a more accurate model, but will be harder to
find examples of the longer word sequences in the corpus

How well did you know this?

Not at all

Perfectly

long-distance dependencies:

N-gram Models is an insufficient model of language because language has
long-distance dependencies

How well did you know this?

Not at all

Perfectly

Bigrams:

any two words that occur together

How well did you know this?

Not at all

Perfectly

Bigram language models:

use the bigram probability, meaning a
conditional probability, to predict how likely it is that the second word follows the first

How well did you know this?

Not at all

Perfectly

Smoothing

Every N-gram training matrix is sparse, even for very large corpora (remember Zipfs law)
* There are words that donʼt occur in the training corpus that may occur in future text - known as the unseen
words
* Whenever a probability is 0, it will multiply the entire sequence to be 0
* Solution: estimate the likelihood of unseen N-grams and include a small probability for unseen words

How well did you know this?

Not at all

Perfectly

Levels of Language Analysis

1 Phonetic
2 Morphological
3 Lexical
4 Syntactic
5 Semantic
6 Discourse
7 Pragmatic

How well did you know this?

Not at all

Perfectly

Speech Processing

Interpretation of speech sounds within & across words
sound waves are analyzed and encoded into a digitized signal

How well did you know this?

Not at all

Perfectly

Rules used in Phonological Analysis

Phonetic rules – sounds within words
e.g. When a vowel stands alone, the vowel is usually long
Phonemic rules – variations of pronunciation when words are spoken
together e.g. “r” in “part” vs. in “rose”
Prosodic rules – fluctuation in stress and intonation across a sentence:
rhythm, volume, pitch, tempo, and stress
* e.g. High pitch vs. low pitch

How well did you know this?

Not at all

Perfectly

Morphology: The Structure of Words

Morphology is the level of language that deals with the internal structure of
words
* General morphological theory applies to all languages as all natural human
languages have systematic ways of structuring words (even sign language)
* Must be distinguished from morphology of a specific language
* English words are structured differently from German words, although
both languages are historically related
* Both are vastly different from Arabic

How well did you know this?

Not at all

Perfectly

Morpheme

A morpheme is a minimal subunit of meaning in a word We can usefully divide morphemes into two classes:
Stems
Affixes

How well did you know this?

Not at all

Perfectly

Stems:

The core meaning-bearing units (e.g., happy)

How well did you know this?

Not at all

Perfectly

Affixes:

Bits and pieces that adhere to stems to change their meanings and grammatical functions: prefixes, infixes, suffixes, circumfixes (e.g., unhappy)

How well did you know this?

Not at all

Perfectly

Inflection:

the combination of stems and affixes where the resulting word has the same
word type (e.g., noun, verb, etc.) as the original. Serves a grammatical purpose that is different from the original but is nevertheless transparently related to the original.
Examples: apple – noun; apples – still a noun

How well did you know this?

Not at all

Perfectly

Derivation

creates a new word by changing the category and/or meaning of the base to
which it applies. Can change the grammatical category (part of speech)
sing (verb) > singer (noun)
Derivation can change the meaning
act of singing > one who sings
Derivation is often limited to a certain group of words
You can Clintonize the government, but you canʼt Bushize the government (a phonological restriction)

How well did you know this?

Not at all

Perfectly

Use of Morphology in NLP Tasks -1
Stemming

Strip prefixes and / or suffixes to find the base root, which may or may not be an actual word
* Misspellings are inconsequential

How well did you know this?

Not at all

Perfectly

Use of Morphology in NLP Tasks -2
Lemmatization

Study These Flashcards

Strip prefixes and / or suffixes to find the base root, which will always be an actual word
* Often based on a word list, such as that
available at WordNet
* Correct spelling if crucial

Use of Morphology in NLP Tasks -3
Part of speech prediction

Study These Flashcards

Knowledge of morphemes for a particular
language can be a powerful aid in guessing
the part of speech for an unknown term

To Stem (Lemma) or Not to Stem (Lemma)

The decisions to stem, lemmatize, remove stop words, and normalize (lowercase) depend on the amount of input documents and the analytic approach. More documents = less need for data reduction.

Part-of-speech Tagging:

Assigning Correct Word Types to Words in the Text The general purpose of a part-of-speech tagger is to associate each word in a text with its correct lexical-syntactic category (represented by a tag)

Varying terminology:

Parts-of-speech (POS), lexical categories, word classes, morphological classes, lexical tags... Lots of debate within linguistics about the number, nature, and universality of these – AND we’ll completely ignore this debate

Penn Treebank Tag Set - NNS

noun, plural

Penn Treebank Tag Set - NNP

proper noun, singular

Penn Treebank Tag Set - NNPS

proper noun plural

Penn Treebank Tag Set - PDT

predeterminer

Penn Treebank Tag Set - POS

possessive ending

Penn Treebank Tag Set - PRP

personal pronoun

Penn Treebank Tag Set - PRP$

possessive pronoun

Penn Treebank Tag Set - RB

adverb

Penn Treebank Tag Set - RBR

adverb, comparative

Penn Treebank Tag Set - RBS

adverb, superlative

Penn Treebank Tag Set - RP

particle

Penn Treebank Tag Set - WRB

wh- adverb

POS Tagging Approaches: Rule-based Approach

Simple and doesn’t require a tagged corpus, but not as accurate as other approaches.

POS Tagging Approaches: Stochastic Approaches

* Refers to any approach which incorporates frequencies or probabilities * Requires a tagged corpus to learn frequencies of words with POS tags * N-gram taggers: uses the context of (a few) previous tags * Hidden Markov Model (HMM) taggers: uses the context of the entire sequence of words and previous tags * This technique has been the most widely used of modern taggers, but has the problem of unknown words

POS Tagging Approaches: Classification Taggers

* Uses morphology of word and (a few) surrounding words * Helps solve the problem of unknown words

Computing the Two Probabilities: Word likelihood probabilities

* VBZ (third-person singular present verb): likely to be “is” * Compute P(is|VBZ) by counting in a labeled corpus

Computing the Two Probabilities: Tag transition (prior) probabilities

* Determiners likely to precede adjectives and nouns * That/DT flight/NN * The/DT yellow/JJ hat/NN * So we expect P(NN|DT) and P(JJ|DT) to be high * Compute P(NN|DT) by counting in a labeled corpus

Word Sense:

We say that a word has more than one word sense (meaning) if there is more than one definition.

Word senses may be:

* Coarse-grained, if not many distinctions are made * Fine-grained, if there are many distinctions of meanings

Lexical Semantics Lexicons –

list of words (or lexemes or stems) with basic info

Lexical Semantics Dictionaries –

a lexicon with definitions for each word sense * Most are now available online

Lexical Semantics Thesauruses –

add synonyms/ antonym for each word sense * WordNet

Lexical Semantics Semantic networks –

add more semantic relations, including semantic categories * WordNet, EuroWordNet

Lexical Semantics Ontologies –

add rules about entities, concepts and relations, semantic categories * UMLS

Lexical Semantics Semantic Lexicon –

Lexicon where each word is assigned to a semantic class * LIWC, ANEW, Subjectivity Lexicon

WordNet – A Hand-Curated Word Database

WordNet is a database of facts about words * Meanings and the relations among them * Words are organized into clusters of synonyms * Synsets Organized into nouns, verbs, adjectives, and adverbs * Currently 170,000 synsets * Available for download, arranged in separate files (DBs)

Hierarchical Semantic Representations

A semantic network provides relations for each word sense: * hypernymy/hyponymy (IS-A), * hypernyms are more general, hyponyms are more specific * meronymy/holonymy (PART-OF),

Lecture 3 Flashcards

N-gram Models, Morphology, Part-of-Speech, Word Senses (55 cards)