Natural Language Processing Flashcards

1
Q

What is term frequency?

A

How often does this term appear in the document (prominence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is inverse document frequency?

A

How infrequently does the term appear across docuemnts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Bayes Theorem?

A

P( A | B) = P(B |A) P(A)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is laplace smoothing?

A

Add one smoothing simply adds a constant to each count. This accounts for overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is information theory?

A

Study of transmission, storage and retrieval of digital information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is entropy?

A

The average uncertainty of a random variable:
H(p) = H(X) = -weighted_sum(p(x)log2p(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is joint entropy?

A

Specify 2 variables, information required for both
chgeck slides for formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is conditional entropy?

A

Civen one variable, how much info to specify to the other.
Check slides for formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the chain rule?

A

H(X, Y) = H(x) + H(Y|X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is mutual information?

A

H(X) + H(Y|X) = H(Y) + H(X|Y)
therefore: H(X) - H(X | Y) = H(Y) - H(Y |X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the noisy channel model?

A

First used in speech recognition, is used to reconstruct a message from an input channel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a statistical language model?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages and disadvantages of Neural networks?

A

Advantages:
- Unlimited input length
- Model size is independent of input size
- History dependent
- Model parameters shared across time steps

Disadvantages:
- Long delays are a problem
- Can’t see the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the statistical language model?

A

mi = argmax p(zh|mi)p(mi)
Translation models have been estimated by aligned corpora, and this makes it hard to estimate p(zh|mi).
p(zh| mi) - is the translation model
p(mi) - is the language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some facts about chatGPT?

A

Trained 93% english, 7% other.
By June 2020: 175 x 10^9 parameters defined f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the attention model?

A

This model allows the RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practise.
Check slide for formula

17
Q

What is attention weight?

A

The amount of attention that the output should pay to the activation .
Check slide for formula