Lecture 3 Flashcards
N-gram Models, Morphology, Part-of-Speech, Word Senses
Applications of Language Models
The goal of a Language Model is to assign a probability that a sentence (or phrase) will occur in natural uses of the language
Applications:
*Machine Translation: P(high winds tonight) > P(large winds tonight)
*Spell Correction
*The office is about fifteen minuets from my house: P(about fifteen minutes from) > P(about fifteen minuets from)
*Speech Recognition
*P(I saw a van)»_space; P(eyes awe of an)
*Summarization, question-answering, and many other NLP applications
Chain Rules:
computing the joint probability of all the words conditioned by the previous words
Markov Assumption
we can predict the next word based on only one word previous
N-gram Probabilities
the observed frequency (count) of the hole sequence divided by the observed frequency of the preceding, or initial, sequence (sometimes called the maximum likelihood estimation (MLE):
Unigram Model: (word frequencies)
The simplest case is that we predict a sentence probability just based on the probabilities of the words with no preceding words
Bigram Model:
(two word frequencies): prediction based on one previous word:
N-gram Models
We can extend to trigrams, 4-grams, 5-grams
* Each higher number will get a more accurate model, but will be harder to
find examples of the longer word sequences in the corpus
long-distance dependencies:
N-gram Models is an insufficient model of language because language has
long-distance dependencies
Bigrams:
any two words that occur together
Bigram language models:
use the bigram probability, meaning a
conditional probability, to predict how likely it is that the second word follows the first
Smoothing
Every N-gram training matrix is sparse, even for very large corpora (remember Zipfs law)
* There are words that donʼt occur in the training corpus that may occur in future text - known as the unseen
words
* Whenever a probability is 0, it will multiply the entire sequence to be 0
* Solution: estimate the likelihood of unseen N-grams and include a small probability for unseen words
Levels of Language Analysis
1 Phonetic
2 Morphological
3 Lexical
4 Syntactic
5 Semantic
6 Discourse
7 Pragmatic
Speech Processing
- Interpretation of speech sounds within & across words
- sound waves are analyzed and encoded into a digitized signal
Rules used in Phonological Analysis
- Phonetic rules – sounds within words
e.g. When a vowel stands alone, the vowel is usually long - Phonemic rules – variations of pronunciation when words are spoken
together e.g. “r” in “part” vs. in “rose” - Prosodic rules – fluctuation in stress and intonation across a sentence:
rhythm, volume, pitch, tempo, and stress
* e.g. High pitch vs. low pitch
Morphology: The Structure of Words
Morphology is the level of language that deals with the internal structure of
words
* General morphological theory applies to all languages as all natural human
languages have systematic ways of structuring words (even sign language)
* Must be distinguished from morphology of a specific language
* English words are structured differently from German words, although
both languages are historically related
* Both are vastly different from Arabic
Morpheme
A morpheme is a minimal subunit of meaning in a word We can usefully divide morphemes into two classes:
Stems
Affixes
Stems:
The core meaning-bearing units (e.g., happy)
Affixes:
Bits and pieces that adhere to stems to change their meanings and grammatical functions: prefixes, infixes, suffixes, circumfixes (e.g., unhappy)
Inflection:
the combination of stems and affixes where the resulting word has the same
word type (e.g., noun, verb, etc.) as the original. Serves a grammatical purpose that is different from the original but is nevertheless transparently related to the original.
Examples: apple – noun; apples – still a noun
Derivation
creates a new word by changing the category and/or meaning of the base to
which it applies. Can change the grammatical category (part of speech)
sing (verb) > singer (noun)
Derivation can change the meaning
act of singing > one who sings
Derivation is often limited to a certain group of words
You can Clintonize the government, but you canʼt Bushize the government (a phonological restriction)
Use of Morphology in NLP Tasks -1
Stemming
Strip prefixes and / or suffixes to find the base root, which may or may not be an actual word
* Misspellings are inconsequential