Jupyter Notebook 1.3-Binary_Classification Flashcards

1
Q

What is classification?

A
  • Definition: Classification is a type of supervised learning where the goal is to predict a categorical label for an input.
  • Examples: Spam detection (Spam or Not Spam), Tumor diagnosis (Malignant or Benign).
  • Key Algorithms: Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN).
  • Output: Discrete values (e.g., classes like 0/1, Yes/No).
  • Performance Metrics: Accuracy, Precision, Recall, F1-Score, AUC-ROC.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is regression?

A
  • Definition: Regression is a type of supervised learning where the goal is to predict a continuous value based on input features.
  • Examples: House price prediction, Stock market forecasting, Temperature prediction.
  • Key Algorithms: Linear Regression, Polynomial Regression, Decision Trees, Random Forest, Support Vector Regression (SVR).
  • Output: Continuous values (e.g., numerical quantities).
  • Performance Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do the notations below stand for?

P
N
TP
FP
TN
FN

A

P = All actual positive data points
N = All actual negative data points
TP = True positives (correctly identified positives)
FP = False positives (negatives wrongly identified as positives)
TN = True negative (correctly identified negatives)
FN = False negative (positives wrongly identified as negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two main forms or supervised learning?

A

Classification and Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the special case of coss-validation called where K is set to the number of data points in the training set?

Kanskje ett random eksamens spørsmål? maybe good maby shit

A

It’s called: leave-one-out.
Each fold is then a single sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does cross-validation work?

A

We randomly split the training set into several parts, called folds. Say into K folds, then train a model K times, each time using a different fold for evaluation and training on the remaining K-1. The average score of the K runs is used to estimate the model’s performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Trap of Unbalanced Datasets?

A

A situation where one class significantly outnumbers the other, leading to misleading model performance, such as high accuracy despite poor detection of the minority class (diabetes competition)

Key problems:
* Accuracy Paradox: Hight overall accuracy but poor minority class detection
* Biased Models: The model may focus on the majority class, ignoring minory cases

Solutions:
* Resampling: Oversample the minority class or undersample the majority class
* Adjust Metrics: Use precision, recall, F1-score or balanced accuracy
* Class Weights: Penalize wrong predictions on the minory class.

Too many cases of healthy individuals than people diagnosed with diabetes! Unbalanced as fudge yo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the StandardScaler do in machine learning?

A

Standardizes features by scaling them to have a mean of 0 and standard deviation of 1.

z = (x-μ)/σ

x: Original feature value
μ: Mean of the feature
σ: Standard deviation of the feature

  • Ensures features contribute equally to the model.
  • Improves performance of algorithms sensitive to data scale (e.g., SGD, SVM, KNN, Neural Networks).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly