ML modul 1 Introduksjon til maskinlæring Flashcards

Question 1

Q

Data preperations: What are some typical problems and various solutions?

Answer

A

Missing data
Fill with some value (zero, mean, …) aka imputation
Skip datapoints containing missing data
Skip entire feature containing missing data
Text attributes
Convert to categorical values

Example:

Ocean_proximity categorical
“NEAR BAY” -> 0
“INLAND” -> 1
“NEAR OCEAN” -> 2

Feature values have different scales
Normalise values (shift to a range [0,1])
Standardize values (shift to have mean equal to 0 and variance eaual to 1)
feature values are not normally disctrubuted
Transform values (compute e.g. logarithm)

Question 2

Q

What is imputation?

Answer

A

Imputation is the process of replacing missing data with substituted values

Question 3

Q

How do we represent a generic ML model as a function f)

Answer

A

ŷ = f(x,θ)

where
* ŷ is the prediction
* x is a data point
* θ are the parameters of the model

Question 4

Q

What is Training in ML?

Answer

A

Training is the process of finding the best parameters θ so that the prediction ŷ is as close as possible to the known target value y.

Question 5

Q

What are some useful metrics for Classification?

Answer

A

Classification: How many did we label correctly?
* Accuracy
* Percision
* Recall
* ROC curve

Question 6

Q

What are some useful metrics for Regression?

Answer

A

Regression: How close did we get?
* Mean squared error (MSE)
* Root mean squared error (RMSE)
* Mean absolute error (MAE)

Question 7

Q

What is accuracy and how do we calculate it?

Answer

A

Accuracy measures how many samples are classified correctly, relative to the total number of samples:

accuracy = correct classifications / all classifications
accuracy = TP + TN / TP + TN + FP + FN

Question 8

Q

What is precision and how do do we calculate it?

Answer

A

Precision measures how many positive classifications that are ctually positive

precision = correct positive classification / all positive classifications
precision = TP / TP + FP

Question 9

Q

What is Recall and how do we calculate it?

Answer

A

Recall measures how many of the actual positives that were classified as positive

recall = correct positive classifications / all actual positives
recall = TP / TP + FN

Question 10

Q

What is Reciever Operator Characteristic (ROC) and how do we calculate it?

Answer

A

ROC is a more common option to plot the true positive rate (TRP) as function of false positive rate (FPR)

TRP:
How many positives did I get right
TPR = TP / TP + FN

FPR
How many negatives did I get wrong
FPR = FP / FP + TN

Question 11

Q

What is the Train-validation-test split used for?

Answer

A

In case we want to comare different models, were need a third set:
The Validation set
The test set is still only for final evaluation.

<———————|———–|———>
Train Val Test

Question 12

Q

ML models are pron to overfitting - i.e. memorising the training data.
How do we know if (when) this happens?

Answer

A

We can compoare performance on the training set to the validation set in the Train-Validation-Test split

Question 13

Q

How does Cross-validation work?

Question 14

Q

What is Reinforcement learning?

Answer

A

Reinforcement learning is a branch of machine learning that trains agents (such as bots to pick the actions that wil maximize their rewards over time within a given environment.

Key Consepts:
* Agent: The learner or decision makes.
* Environment: Everything the agent interacts with.
* State: A specific situation in which the agent finds itself.
* Actions: All possible moves the agent can make.
* Reward: Feedback from the environment

Question 15

Q

What is Supervised learning?

Answer

A

Supervised learning is a type of machine learning algorithm that learns from labeled data. Labeled data is data that has been tagged with a correct answer or classification.

Key Points:
* Supervised learning involves training a machine from labeled data.
* Labeled data consists of examples with the correct answer or classification.
* The machine learns the relationship between inputs (fruit images) and outputs (fruit labels).
* The trained machine can then make predictions on new, unlabeled data.

Question 16

Q

What is Unsupervised learning?

Answer

A

Unsupervised learning is a type of machine learning that learns from unlabeled data. This means that the data does not have any pre-existing labels or categories. The goal of unsupervised learning is to discover patterns and relationships in the data without any explicit guidance.

Key Points

Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
Clustering algorithms group similar data points together based on their inherent characteristics.
Feature extraction captures essential information from the data, enabling the model to make meaningful distinctions.
Label association assigns categories to the clusters based on the extracted patterns and characteristics.

Question 17

Q

What are some applications of Unsupervised Learning?

Answer

A

Market segmentation, anomaly detection, and recommendation systems.

Question 18

Q

What are some common algorithms used in Supervised Learning?

Answer

A

Decision Trees, Support Vector Machines (SVM), Linear Regression, and Neural Networks.

Question 19

Q

What are some applications of Supervised Learning?

Answer

A

Image classification, spam detection, and predicting stock prices.

Question 20

Q

What are some applications of Reinforcement Learning?

Answer

A

Robotics, game AI (e.g., AlphaGo), and autonomous vehicles.