Lecture 4 - Logistic Regression Flashcards
What is logistic regression?
Logistic regression is an algorithm for discovering the link between features and some particular outcome.
It is the baseline supervised machine learning algorithm for classification in NLP.
Logistic regression can be used to classify an observation into one of two classes or one of many classes. How are these two methods of logistic regression called?
Binary logistic regression
Multinomial logistic regression
Naive Bayes is a … model and Logistic Regression is a … classifier.
Choose between “discriminative” and “generative”
Naive Bayes is a generative classifier
Logistic Regression is a discriminative classifier
What is the difference between generative and discriminative classifiers?
In General, A Discriminative model models the decision boundary between the classes. A Generative Model explicitly models the actual distribution of each class. In final both of them is predicting the conditional probability P(Animal | Features).
A Generative Model learns the joint probability distribution p(x,y). It predicts the conditional probability with the help of Bayes Theorem. A Discriminative model learns the conditional probability distribution p(y|x). Both of these models were generally used in supervised learning problems.
In terms of how they find what class to be assigned to an observation:
- Naive Bayes: makes use of likelihood which express how to generate the features of a document if we knew it was of class c
- Logistic Regression: try to compute the posterior directly
Logistic regression happens in two phases: train and test. Explain these.
Training: given a training set of M observations (x,y)
- train the system using stochastic gradient descent and the cross-entropy loss
- learn the parameters w and b of the model
Test: Given a test example x and a class y = {0, 1} * compute p(y|x) and return the higher probability class
What are the components of logistic regression?
Logistic regression solves tasks by learning from a training set, a vector of weights and a bias term
- for each feature xi we have a weight wi (that represents how important the feature xi is to the classification decision). e.g., xi = “awesome”, wi very positive = 5 or xi = “absymal”, wi very negative = -5
- the bias term or the intercept is added to the weighted inputs
The sigmoid function can be used in logistic regression. How is the function that generalizes the sigmoid function and can be used in multinomial logistic regression called?
Softmax
Softmax regression
True or False?
Assume we have a document, and we have three classes that it can be classified into positive, negative and neutral.
then, P(positve|doc) + P(negative|doc) + P(netural|doc) = 3
because we have three clases.
False.
P(positve|doc) + P(negative|doc) + P(netural|doc) = 1
All the probabilities should sum up to 1.
True or False?
In a softmax regression, there are distinct weights assigned to every variable
True
What does overfitting mean? And how can we avoid it in logistic regression?
Overfitting means fitting the details of the training data so exactly that the model doesn’t generalize well to the test set
Regularization is a solution for overfitting in logistic regression
What are the two families of theories of emotion?
- Atomic basic emotions
* a finite list of 6 or 8, from which others are generated
(surprise, happiness, anger, fear, disgust, sadness) - Dimensions of emotions
* valence (positive, negative)
* arousal (strong, weak)
* control
What is the difference between atomic and dimensional emotions?
Atomic:
- emotions are units
- limited number of basic emotions
- basic emotions are innate and universal
Dimensional:
- emotions are dimensions
- limited number of labels but unlimited number of emotions
- emotions are culturally learned
If you are interested, and you think it is important for the exam, there is a thing called Plutchick’s wheel of emotions. Swipe to see what it is
8 basic emotions in 4 opposing pairs: joy - sadness anger - fear trust - disgust anticipation - surprise
What is the basic algorithm for detecting document sentiment using a lexicon?
- sum the weights of each positive word in the document
- sum the weights of each negative word in the document
- choose whichever value (positive or negative) has the higher sum
How does logistic regression learn the values for w (weight) and b (bias)? Explain the process.
We have a training set that has the correct y for each x
But or classifier gives the estimate y’, not the true y
=> so we want to find w and b that make the y’ as close as possible to y
We need a loss function = which is a metric that tells us how close is y’ to the true gold y
And we need an algorithm that minimizes the loss
For logistic regression, the loss function is cross-entropy loss, and the algorithm for minimizing the loss is stochastic gradient descent