Jupyter Notebook 1.3-Binary_Classification Flashcards

Question 1

Q

What is classification?

Answer

A

Definition: Classification is a type of supervised learning where the goal is to predict a categorical label for an input.
Examples: Spam detection (Spam or Not Spam), Tumor diagnosis (Malignant or Benign).
Key Algorithms: Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN).
Output: Discrete values (e.g., classes like 0/1, Yes/No).
Performance Metrics: Accuracy, Precision, Recall, F1-Score, AUC-ROC.

Question 2

Q

What is regression?

Answer

A

Definition: Regression is a type of supervised learning where the goal is to predict a continuous value based on input features.
Examples: House price prediction, Stock market forecasting, Temperature prediction.
Key Algorithms: Linear Regression, Polynomial Regression, Decision Trees, Random Forest, Support Vector Regression (SVR).
Output: Continuous values (e.g., numerical quantities).
Performance Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared.

Question 3

Q

What do the notations below stand for?

P
N
TP
FP
TN
FN

Answer

A

P = All actual positive data points
N = All actual negative data points
TP = True positives (correctly identified positives)
FP = False positives (negatives wrongly identified as positives)
TN = True negative (correctly identified negatives)
FN = False negative (positives wrongly identified as negatives)

Question 4

Q

What are the two main forms or supervised learning?

Answer

A

Classification and Regression

Question 5

Q

What is the special case of coss-validation called where K is set to the number of data points in the training set?

Kanskje ett random eksamens spørsmål? maybe good maby shit

Answer

A

It’s called: leave-one-out.
Each fold is then a single sample

Question 6

Q

How does cross-validation work?

Answer

A

We randomly split the training set into several parts, called folds. Say into K folds, then train a model K times, each time using a different fold for evaluation and training on the remaining K-1. The average score of the K runs is used to estimate the model’s performance

Question 7

Q

What is the Trap of Unbalanced Datasets?

Answer

A

A situation where one class significantly outnumbers the other, leading to misleading model performance, such as high accuracy despite poor detection of the minority class (diabetes competition)

Key problems:
* Accuracy Paradox: Hight overall accuracy but poor minority class detection
* Biased Models: The model may focus on the majority class, ignoring minory cases

Solutions:
* Resampling: Oversample the minority class or undersample the majority class
* Adjust Metrics: Use precision, recall, F1-score or balanced accuracy
* Class Weights: Penalize wrong predictions on the minory class.

Too many cases of healthy individuals than people diagnosed with diabetes! Unbalanced as fudge yo

Question 8

Q

What does the StandardScaler do in machine learning?

Answer

A

Standardizes features by scaling them to have a mean of 0 and standard deviation of 1.

z = (x-μ)/σ

x: Original feature value
μ: Mean of the feature
σ: Standard deviation of the feature

Ensures features contribute equally to the model.
Improves performance of algorithms sensitive to data scale (e.g., SGD, SVM, KNN, Neural Networks).

Jupyter Notebook 1.3-Binary_Classification Flashcards

(8 cards)