CHAP 7 : Logistic Regression Flashcards

1
Q

What is logistic regression?

A

It is a supervised learning algorithm. It is a classfication algorithm that assigns data to a discrete set of classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give example(s) of classification problems.

A
  1. Email classification : spam or not spam
  2. Financial data analysis : fraud / not fraud
  3. Credit analysis : approve or deny credit
  4. Marketing : will buy or wont buy
  • basically a binary classificaion (only 2 classes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the logistic function for logisitic regression? (analogous to best fit line eqn of linear regression)?

A

y hat = g(W.X^T),
g(X) = 1/(1+e^-z),

thus y hat = 1/(1+e^-(W.X^T))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the name for the logistic function?

A

Sigmoid function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

From the values generated by the sigmoid function, how do the values get classified into class 0 or 1 by the classifier?

A

if the value < 0.5, the class value = 0. If the value >= 0.5, class value = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the error function given by in logistic regression?

A

E(W) = 1/2N (summation (y(i) - yhat(i)^2) – refer to notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why cant we use the same error function (average MSE) as linear regression for logisitic regression?

A

There will be many local minima and the algorithm may be stuck in a local minima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the cost function for logistic regression?

A

cost (yhat(x), y) =
-log(yhat(x)) if y = 1;
-log(1-yhat(x)) if y = 0.

See notes [we can rerite error function using the cost function]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the gradient descent algorithm work for logistic regression?

A
  1. initialise W with random values or zeros
  2. Loop till convergence
    for each W(j) in W do :
    w(j) = w(j) + L . 1/N summation (y(i) - y hat (xi))x(j)(i)) , where j –> jth col, ith col

see notes for equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a confusion matrix?

A

A confusion matrix is a performance measurement for machine learning classification

It presents a table layout of the different outcomes of the prediction and results of a classification problem and helps visualize its outcomes.

Values : True positive, true negative, false positive, false negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between the training dataset and the validation dataset?

A

Training dataset is a set of examples used for learning, that is to fit the parameters of the classifier. A validation dataset contains different samples to evaluate trained ML models.

[The validation dataset is useful when it comes to hyper-parameter tuning and model selection. The validation examples included in this set will be used to find the optimal values for the hyper-parameters of the model under consideration.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

From the confusion matrix, there are 4 other metrics to evaluate classification output. What are they?

A
  1. Precision
  2. Recall (sensitivity)
  3. F1 score
  4. Support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is precision, how is it calculated?

A

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. (High precision relates to the low false positive rate. )

Precision = TP/(TP+FP)

TP: true positive ; FP : False positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is recall and how is it calculated?

A

It is the ratio of correctly predicted positive observations to the all observations in actual class .

Recall = TP / TP+FN

FN: False negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is F1 score and how is it calculated?

A

F1 Score is the weighted average of Precision and Recall

  • F1 Score = 2*(Recall * Precision) / (Recall + Precision)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is support?

A

Support is the number of actual occurrences of the class in the specified dataset, i.e., the support is the number of occurrences of each class in original y value from the dataset.

17
Q

You are training a logistic regression model and you find that your training error is close to 0, but the testing error is very high. What can be done to improve this situation? Note: This situation is applicable to all the machine learning problems and not specific to logistic regression.

A
  1. Increase the training data size
  2. Train on a combination of your training data and your test data but test only on your test data.

(: Since we are facing overfitting we can increase the training data size to combat this. Also, if we train on our test data our test loss will definitely improve dramatically, but you should never do this in practice because it defeats the purpose of testing, and will make performance worse when the model is deployed and used on new data.)

18
Q

What are 2 kinds of validation methods used?

A
  1. K-fold cross validation
  2. leave-one-out cross validation
19
Q

How does k fold cross validation work?

A

It splits the data into k folds, then trains the data on k-1 folds and test on the one fold that was left out. It does this for all combinations and averages the result on each instance.

20
Q

What is the advantage of k fold cross validation?

A

The advantage is that all observations are used for both training and validation, and each observation is used once for validation.

21
Q

What is leave-one-out cross validaton?

A

A variant of k-Fold CV is Leave-one-out Cross-Validation (LOOCV). LOOCV uses each sample in the data as a separate test set while all remaining samples form the training set. This variant is identical to k-fold CV when k = n (number of observations).