Lecture 9: NLP Flashcards
NLP VS Speech Processing
Natural Language Processing
= automatic processing of written texts
1.
Natural Language Understanding
Input = text
2.
Natural Language Generation
Output = text
Speech Processing
= automatic processing of speech
1.
Speech Recognition
Input = acoustic signal
2.
Speech Synthesis
Output = acoustic signal
What is BOW model
A simple model where word order is ignored
used in many applications:
NB spam filter seen in class a few weeks ago
Information Retrieval (eg. google search)
…
But has severe limits to understand meaning of text…
Maybe we should take word order into account…
N-gram model
An n-gram model is a probability distribution over sequences of events (grams/units/items)
Why onnly bigram or trigram?
Markov approximation is still costly
with a 20 000 word vocabulary:
bigram needs to store 400 million parameters
trigram needs to store 8 trillion parameters
using a language model > trigram is impractical
How would one recognize language?
- train a bigram of which characters are followed up by which
problem with n-gram
Natural language is not linear ….
there may be long-distance dependencies.
Syntactic dependencies
The man next to the large oak tree near … is tall.
The men next to the large oak tree near … are tall.
Semantic dependencies
The bird next to the large oak tree near … flies rapidly.
The man next to the large oak tree near … talks rapidly.
World knowledge
Michael Jackson, who was featured in …, is buried in California.
Michael Bublé, who was featured in …, is living in California.
…