Lecture 4 - Logistic Regression Flashcards

Question 1

Q

What is logistic regression?

Answer

A

Logistic regression is an algorithm for discovering the link between features and some particular outcome.

It is the baseline supervised machine learning algorithm for classification in NLP.

Question 2

Q

Logistic regression can be used to classify an observation into one of two classes or one of many classes. How are these two methods of logistic regression called?

Answer

A

Binary logistic regression

Multinomial logistic regression

Question 3

Q

Naive Bayes is a … model and Logistic Regression is a … classifier.

Choose between “discriminative” and “generative”

Answer

A

Naive Bayes is a generative classifier

Logistic Regression is a discriminative classifier

Question 4

Q

What is the difference between generative and discriminative classifiers?

Answer

A

In General, A Discriminative model ‌models the decision boundary between the classes. A Generative Model ‌explicitly models the actual distribution of each class. In final both of them is predicting the conditional probability P(Animal | Features).

A Generative Model ‌learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem. A Discriminative model ‌learns the conditional probability distribution p(y|x). Both of these models were generally used in supervised learning problems.

In terms of how they find what class to be assigned to an observation:

Naive Bayes: makes use of likelihood which express how to generate the features of a document if we knew it was of class c
Logistic Regression: try to compute the posterior directly

Question 5

Q

Logistic regression happens in two phases: train and test. Explain these.

Answer

A

Training: given a training set of M observations (x,y)

train the system using stochastic gradient descent and the cross-entropy loss
learn the parameters w and b of the model

Test: Given a test example x and a class y = {0, 1}
* compute p(y|x) and return the higher probability class

Question 6

Q

What are the components of logistic regression?

Answer

A

Logistic regression solves tasks by learning from a training set, a vector of weights and a bias term

for each feature xi we have a weight wi (that represents how important the feature xi is to the classification decision). e.g., xi = “awesome”, wi very positive = 5 or xi = “absymal”, wi very negative = -5
the bias term or the intercept is added to the weighted inputs

Question 7

Q

The sigmoid function can be used in logistic regression. How is the function that generalizes the sigmoid function and can be used in multinomial logistic regression called?

Answer

A

Softmax

Softmax regression

Question 8

Q

True or False?
Assume we have a document, and we have three classes that it can be classified into positive, negative and neutral.

then, P(positve|doc) + P(negative|doc) + P(netural|doc) = 3
because we have three clases.

Answer

A

False.
P(positve|doc) + P(negative|doc) + P(netural|doc) = 1

All the probabilities should sum up to 1.

Question 9

Q

True or False?

In a softmax regression, there are distinct weights assigned to every variable

Question 10

Q

What does overfitting mean? And how can we avoid it in logistic regression?

Answer

A

Overfitting means fitting the details of the training data so exactly that the model doesn’t generalize well to the test set

Regularization is a solution for overfitting in logistic regression

Question 11

Q

What are the two families of theories of emotion?

Answer

A

Atomic basic emotions
* a finite list of 6 or 8, from which others are generated
(surprise, happiness, anger, fear, disgust, sadness)
Dimensions of emotions
* valence (positive, negative)
* arousal (strong, weak)
* control

Question 12

Q

What is the difference between atomic and dimensional emotions?

Answer

A

Atomic:

emotions are units
limited number of basic emotions
basic emotions are innate and universal

Dimensional:

emotions are dimensions
limited number of labels but unlimited number of emotions
emotions are culturally learned

Question 13

Q

If you are interested, and you think it is important for the exam, there is a thing called Plutchick’s wheel of emotions. Swipe to see what it is

Answer

A

8 basic emotions in 4 opposing pairs:
joy - sadness
anger - fear
trust - disgust
anticipation - surprise

Question 14

Q

What is the basic algorithm for detecting document sentiment using a lexicon?

Answer

A

sum the weights of each positive word in the document
sum the weights of each negative word in the document
choose whichever value (positive or negative) has the higher sum

Question 15

Q

How does logistic regression learn the values for w (weight) and b (bias)? Explain the process.

Answer

A

We have a training set that has the correct y for each x
But or classifier gives the estimate y’, not the true y
=> so we want to find w and b that make the y’ as close as possible to y

We need a loss function = which is a metric that tells us how close is y’ to the true gold y
And we need an algorithm that minimizes the loss

For logistic regression, the loss function is cross-entropy loss, and the algorithm for minimizing the loss is stochastic gradient descent

Question 16

Q

What is the cross-entropy loss function?

Answer

Study These Flashcards

A

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

Question 17

Q

What is the difference between Gradient Descent and Stochastic Gradient Descent?

Answer

Study These Flashcards

A

As compared to Regular Gradient Descent, Stochastic Gradient Descent would randomly pick one sample for each step, and just use that one sample to calculate the derivatives

Question 18

Q

How does Gradient Descent Calculate step size? So, how does the algorithm know how much to move at every step?

Answer

Study These Flashcards

A

Slope x Learning Rate

Question 19

Q

How does Gradient Descent know where to stop descending on the curve to find the optimal value?

Answer

Study These Flashcards

A

When the step size is very close to 0

Question 20

Q

Explain the process of Gradient Descent step-by-step.

Answer

Study These Flashcards

A

Take the derivative of the Loss Function for each parameter in it
Pick random values for the parameters
Plug the parameter values into the derivatives
Calculate the Step Size = Slope x Learning Rate
Calculate New Parameters = Old Parameters - Step Size
Then, go back to step 3 and repeat until step size is very small, or you reach the maximum number of steps

Lecture 4 - Logistic Regression Flashcards

(20 cards)