Lecture 4 - Logistic Regression Flashcards

1
Q

What is logistic regression?

A

Logistic regression is an algorithm for discovering the link between features and some particular outcome.

It is the baseline supervised machine learning algorithm for classification in NLP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logistic regression can be used to classify an observation into one of two classes or one of many classes. How are these two methods of logistic regression called?

A

Binary logistic regression

Multinomial logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Naive Bayes is a … model and Logistic Regression is a … classifier.

Choose between “discriminative” and “generative”

A

Naive Bayes is a generative classifier

Logistic Regression is a discriminative classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between generative and discriminative classifiers?

A

In General, A Discriminative model ‌models the decision boundary between the classes. A Generative Model ‌explicitly models the actual distribution of each class. In final both of them is predicting the conditional probability P(Animal | Features).

A Generative Model ‌learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem. A Discriminative model ‌learns the conditional probability distribution p(y|x). Both of these models were generally used in supervised learning problems.

In terms of how they find what class to be assigned to an observation:

  • Naive Bayes: makes use of likelihood which express how to generate the features of a document if we knew it was of class c
  • Logistic Regression: try to compute the posterior directly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Logistic regression happens in two phases: train and test. Explain these.

A

Training: given a training set of M observations (x,y)

  • train the system using stochastic gradient descent and the cross-entropy loss
  • learn the parameters w and b of the model
Test: Given a test example x and a class y = {0, 1}
* compute p(y|x) and return the higher probability class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the components of logistic regression?

A

Logistic regression solves tasks by learning from a training set, a vector of weights and a bias term

  • for each feature xi we have a weight wi (that represents how important the feature xi is to the classification decision). e.g., xi = “awesome”, wi very positive = 5 or xi = “absymal”, wi very negative = -5
  • the bias term or the intercept is added to the weighted inputs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The sigmoid function can be used in logistic regression. How is the function that generalizes the sigmoid function and can be used in multinomial logistic regression called?

A

Softmax

Softmax regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False?
Assume we have a document, and we have three classes that it can be classified into positive, negative and neutral.

then, P(positve|doc) + P(negative|doc) + P(netural|doc) = 3
because we have three clases.

A

False.
P(positve|doc) + P(negative|doc) + P(netural|doc) = 1

All the probabilities should sum up to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False?

In a softmax regression, there are distinct weights assigned to every variable

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does overfitting mean? And how can we avoid it in logistic regression?

A

Overfitting means fitting the details of the training data so exactly that the model doesn’t generalize well to the test set

Regularization is a solution for overfitting in logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two families of theories of emotion?

A
  1. Atomic basic emotions
    * a finite list of 6 or 8, from which others are generated
    (surprise, happiness, anger, fear, disgust, sadness)
  2. Dimensions of emotions
    * valence (positive, negative)
    * arousal (strong, weak)
    * control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between atomic and dimensional emotions?

A

Atomic:

  • emotions are units
  • limited number of basic emotions
  • basic emotions are innate and universal

Dimensional:

  • emotions are dimensions
  • limited number of labels but unlimited number of emotions
  • emotions are culturally learned
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If you are interested, and you think it is important for the exam, there is a thing called Plutchick’s wheel of emotions. Swipe to see what it is

A
8 basic emotions in 4 opposing pairs:
joy - sadness
anger - fear
trust - disgust
anticipation - surprise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the basic algorithm for detecting document sentiment using a lexicon?

A
  • sum the weights of each positive word in the document
  • sum the weights of each negative word in the document
  • choose whichever value (positive or negative) has the higher sum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does logistic regression learn the values for w (weight) and b (bias)? Explain the process.

A

We have a training set that has the correct y for each x
But or classifier gives the estimate y’, not the true y
=> so we want to find w and b that make the y’ as close as possible to y

We need a loss function = which is a metric that tells us how close is y’ to the true gold y
And we need an algorithm that minimizes the loss

For logistic regression, the loss function is cross-entropy loss, and the algorithm for minimizing the loss is stochastic gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the cross-entropy loss function?

A

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

17
Q

What is the difference between Gradient Descent and Stochastic Gradient Descent?

A

As compared to Regular Gradient Descent, Stochastic Gradient Descent would randomly pick one sample for each step, and just use that one sample to calculate the derivatives

18
Q

How does Gradient Descent Calculate step size? So, how does the algorithm know how much to move at every step?

A

Slope x Learning Rate

19
Q

How does Gradient Descent know where to stop descending on the curve to find the optimal value?

A

When the step size is very close to 0

20
Q

Explain the process of Gradient Descent step-by-step.

A
  1. Take the derivative of the Loss Function for each parameter in it
  2. Pick random values for the parameters
  3. Plug the parameter values into the derivatives
  4. Calculate the Step Size = Slope x Learning Rate
  5. Calculate New Parameters = Old Parameters - Step Size
  6. Then, go back to step 3 and repeat until step size is very small, or you reach the maximum number of steps