Logistic Regression Flashcards

1
Q

What is the Definition of Logistic Regression

A

Logistic Regression is a supervised learning algorithm used for classification problems. Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts probabilities for categorical outcomes (e.g., spam or not spam, disease or no disease).

🔹 It uses the Sigmoid (Logistic) Function to squash outputs between 0 and 1, making it suitable for classification.

🔹 If the probability is > 0.5, it’s classified as 1 (positive class), else 0 (negative class).

🔹 Despite its name, it’s a classification algorithm, not a regression algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Mathematical Formula of Logistic Regression

A

Logistic Regression predicts the probability
P(Y=1∣X) using the Sigmoid (Logistic) Function:

P(Y=1∣X)= 1/ 1 + e pow - (wX + b)

Where:
wX+b is the linear combination of input features.

e is Euler’s number (~2.718).

The Sigmoid function converts any real number into a probability between 0 and 1.

Decision Rule
If P(Y=1)>0.5 → Predict Class 1
If P(Y=1)≤0.5 → Predict Class 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is decision boundary in Logistic Regression?

A

The decision boundary is the line (or curve) that separates different classes in logistic regression.
In logistic regression, wx+b=0 represents the decision boundary.

If a data point falls on one side, it’s classified as Class 1 (e.g., “Yes”).

If it falls on the other side, it’s classified as Class 0 (e.g., “No”).

For simple logistic regression (with two features), this boundary is a straight line.

For complex data, it can be curved if we use advanced techniques like polynomial features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is loss function in logistic regression?

A

In Logistic Regression, we use the Log Loss (Binary Cross-Entropy) instead of Mean Squared Error (MSE) because MSE doesn’t work well with classification.

Why Log Loss?
It penalizes wrong predictions more strongly than correct ones.
It helps in optimizing the model by making gradients smooth.

Formula:
𝐿𝑜𝑠𝑠 = −1/𝑁∑[𝑦log⁡(𝑦^)+(1−𝑦)log(1−𝑦^)]

Where:
y = actual label (0 or 1)
y^ = predicted probability
N = total samples

How It Works:
If actual = 1 and prediction is close to 1, loss is small.
If actual = 1 but prediction is close to 0, loss is high.
If actual = 0 but prediction is close to 1, loss is high.
The goal is to minimize Log Loss so the model makes better predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How Gradient Descent works in Logistic Regression?

A

Compute Predictions → Use sigmoid function:

Calculate Loss → Use Log Loss (Cross-Entropy Loss):

Compute Gradients → Partial derivatives w.r.t. w and b in loss function

Update Parameters → Using Gradient Descent:(where α is the learning rate)

Repeat Until Convergence → Stop when the loss doesn’t decrease significantly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Regularization in Logistic Regression

A

L2 Regularization (Ridge Regression)
This method adds a penalty on the square of the weights.
The model reduces large weights but does not make them zero.
It works best when all features contribute to the prediction because it shrinks them proportionally without removing any.

L1 Regularization (Lasso Regression)
This method adds a penalty on the absolute value of weights.
Unlike L2, it forces some weights to become exactly zero, effectively removing less important features from the model.
It works best when only a few features are actually useful because it selects only the important ones and ignores the rest.

Elastic Net (Combination of L1 & L2)
It combines the strengths of both L1 and L2 regularization.
Useful when we need both feature selection (L1) and weight reduction (L2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the evaluation metrics available for classification problems

A

1️⃣Accuracy – Percentage of correctly classified samples. Best for balanced datasets.

2️⃣ Confusion Matrix – Shows actual vs. predicted values (TP, TN, FP, FN).

3️⃣ Precision – Out of predicted positives, how many are actually positive? (Useful when FP is costly).

4️⃣ Recall (Sensitivity) – Out of actual positives, how many did we correctly predict? (Useful when FN is costly).

5️⃣ F1-Score – Harmonic mean of Precision & Recall (Best when balance is needed).

6️⃣ ROC Curve & AUC – Measures model’s ability to separate classes. Higher AUC is better.

7️⃣ Log Loss (Cross-Entropy Loss) – Measures uncertainty of predictions (Lower is better).

✅ Balanced Dataset → Accuracy
✅ Imbalanced Dataset → Precision, Recall, F1-Score
✅ Probability-Based Models → Log Loss
✅ Ranking Models → AUC-ROC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain confusion matrix

A

Structure of Confusion Matrix
Actual \ P (1) N (0)
Predicted

Positive (1) TP FN
(True Positive) (False Negative)

Negative (0) FP TN
(False Positive) (True Negative)

True Positive (TP) → Model correctly predicts Positive (1)

True Negative (TN) → Model correctly predicts Negative (0)

False Positive (FP) → Model incorrectly predicts Positive (1) when it is actually Negative (0) (Type I Error)

False Negative (FN) → Model incorrectly predicts Negative (0) when it is actually Positive (1) (Type II Error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why polynomial features are used ?

A

Polynomial Features
Polynomial features transform the original features into higher-degree terms to capture non-linearity in the data.

Why use Polynomial Features?
Logistic regression assumes linear relationships, but some datasets require curved decision boundaries.
By adding polynomial terms, we let the model learn more complex patterns.

Example:
Suppose we have one feature X. A linear model would be:
𝑦=𝑤0+𝑤1𝑋y
If this doesn’t fit the data well, we add

polynomial features:
𝑦=𝑤0+𝑤1𝑋+𝑤2X²+𝑤3X³

Now, the model can learn curves instead of just straight lines!

How It Helps in Logistic Regression?
Instead of using only X, we use X², X³, … to create curved decision boundaries.
Useful when classes aren’t linearly separable.
Downsides:
Adds more features, making the model complex.
Overfitting risk (use regularization to control it).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is logistic regression in short?

A

Logistic Regression = Linear Regression + Sigmoid Activation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly