Statistical Language Modelling Flashcards

1
Q

what is NLP

A

Natural Language Processing builds systems that uses computational techniques to model and process natural languae in an automated way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is word-level processing

A

Before doing any text processing, we need to prepare out input data into sentences, then words, then tokens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three ways of predicting the probability of a sequence

A
  1. Spellchecking
  2. Grammatical error correction
  3. Autocomplete/suggestions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the 4 n-grams

A

Unigrams
Bigrams
Trigrams
Quadrigrams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is smoothing

A

techniques to ensure a low probability for unseen combinations without compromising the overall statistics of the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 3 types of smoothing

A
  1. Laplace smoothing
  2. add-k smoothing
  3. Kneser-Ney smoothing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is laplace smoothing

A

adds one to all counts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is add-k smoothing

A

rather than adding 1 to all counts, we can generalize to arbitrary k (typically between 0 and 1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what n-gram is better?

A
  • For higher n we capture more context and so we can make better predictions
  • but for higher n, we also need more data and inevitably it will be sparse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we maximize probabilities

A

Given a tarin set and a test set, we want the model to maximize the probability of the test set. For bigarms this means we want to maximize:
p(w1w2…wn)=P(wi|wi-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly