NLP Tips and Tricks Flashcards

Question 1

Q

How does the softmax classifier work?

Answer

A

It gives a probability of different classifiers from a set of scores. For each input x, predict the probability of class k by taking the dot product of the weight matrix for that class and input x, taking the exponential of it, all divided by the exponent of the sum of all the scores for all the classes

Question 2

Q

What is the input and output for softmax?

Answer

A

The input is a vector of words of length n

The output is a vector of labels (POS tags, NE BIO tags)

Question 3

Q

What is the size of the softmax weight matrix?

Answer

A

It has dimensions of (C x n), where C is the number of classes and n is the number of entries in the input vector

Question 4

Q

What is cross-entropy loss?

Answer

A

It is a loss function used to train weights by computing the loss for the probabilities. We want to minimise the cross entropy loss

Question 5

Q

How do we compute cross entropy?

Answer

A

The P of true class distribution, multiplied by the log of the P of the predicted class, summed across all possible classes, multiplied by minus 1. This can be simplified to be -log q (k=true_class), where q is the predicted probability distribution

Question 6

Q

What do activation functions do?

Answer

A

They compute the hidden layer values given an input to compute some output values for that hidden layer

Question 7

Q

How are model parameters modelled?

Answer

A

They are a C x n matrix of weights, where C is the number of classes and n is the length of the input vector

Question 8

Q

What is shown in the image?

Answer

A

It is the matrix of gradient loss, which is computed with respect to the parameters

Question 9

Q

What is backpropagation?

Answer

A

It is a way to compute gradients efficiently using matrix computation.

Question 10

Q

Why is matrix computation good?

Answer

A

It is highly parallelizable

Question 11

Q

What does forward propagation do?

Answer

A

It computes hidden layer values using activation function

Question 12

Q

What does back propagation do?

Answer

A

It calculates partial derivatives at each step and pass gradients back through the graph
Compute local gradients + apply chain rule
The downstream gradient = upstream gradient x local gradient
Having multiple inputs will lead to multiple local gradients

NLP Tips and Tricks Flashcards

(12 cards)