ML modul 1 Introduksjon til maskinlæring Flashcards

1
Q

Data preperations: What are some typical problems and various solutions?

A
  • Missing data
  • Fill with some value (zero, mean, …) aka imputation
  • Skip datapoints containing missing data
  • Skip entire feature containing missing data
  • Text attributes
  • Convert to categorical values

Example:

Ocean_proximity categorical
“NEAR BAY” -> 0
“INLAND” -> 1
“NEAR OCEAN” -> 2

  • Feature values have different scales
  • Normalise values (shift to a range [0,1])
  • Standardize values (shift to have mean equal to 0 and variance eaual to 1)
  • feature values are not normally disctrubuted
  • Transform values (compute e.g. logarithm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is imputation?

A

Imputation is the process of replacing missing data with substituted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we represent a generic ML model as a function f)

A

ŷ = f(x,θ)

where
* ŷ is the prediction
* x is a data point
* θ are the parameters of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Training in ML?

A

Training is the process of finding the best parameters θ so that the prediction ŷ is as close as possible to the known target value y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some useful metrics for Classification?

A

Classification: How many did we label correctly?
* Accuracy
* Percision
* Recall
* ROC curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some useful metrics for Regression?

A

Regression: How close did we get?
* Mean squared error (MSE)
* Root mean squared error (RMSE)
* Mean absolute error (MAE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is accuracy and how do we calculate it?

A

Accuracy measures how many samples are classified correctly, relative to the total number of samples:

accuracy = correct classifications / all classifications
accuracy = TP + TN / TP + TN + FP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is precision and how do do we calculate it?

A

Precision measures how many positive classifications that are ctually positive

precision = correct positive classification / all positive classifications
precision = TP / TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Recall and how do we calculate it?

A

Recall measures how many of the actual positives that were classified as positive

recall = correct positive classifications / all actual positives
recall = TP / TP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Reciever Operator Characteristic (ROC) and how do we calculate it?

A

ROC is a more common option to plot the true positive rate (TRP) as function of false positive rate (FPR)

TRP:
How many positives did I get right
TPR = TP / TP + FN

FPR
How many negatives did I get wrong
FPR = FP / FP + TN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Train-validation-test split used for?

A

In case we want to comare different models, were need a third set:
The Validation set
The test set is still only for final evaluation.

<———————|———–|———>
Train Val Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ML models are pron to overfitting - i.e. memorising the training data.
How do we know if (when) this happens?

A

We can compoare performance on the training set to the validation set in the Train-Validation-Test split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Cross-validation work?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Reinforcement learning?

A

Reinforcement learning is a branch of machine learning that trains agents (such as bots to pick the actions that wil maximize their rewards over time within a given environment.

Key Consepts:
* Agent: The learner or decision makes.
* Environment: Everything the agent interacts with.
* State: A specific situation in which the agent finds itself.
* Actions: All possible moves the agent can make.
* Reward: Feedback from the environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Supervised learning?

A

Supervised learning is a type of machine learning algorithm that learns from labeled data. Labeled data is data that has been tagged with a correct answer or classification.

Key Points:
* Supervised learning involves training a machine from labeled data.
* Labeled data consists of examples with the correct answer or classification.
* The machine learns the relationship between inputs (fruit images) and outputs (fruit labels).
* The trained machine can then make predictions on new, unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Unsupervised learning?

A

Unsupervised learning is a type of machine learning that learns from unlabeled data. This means that the data does not have any pre-existing labels or categories. The goal of unsupervised learning is to discover patterns and relationships in the data without any explicit guidance.

Key Points

Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
Clustering algorithms group similar data points together based on their inherent characteristics.
Feature extraction captures essential information from the data, enabling the model to make meaningful distinctions.
Label association assigns categories to the clusters based on the extracted patterns and characteristics.

17
Q

What are some applications of Unsupervised Learning?

A

Market segmentation, anomaly detection, and recommendation systems.

18
Q

What are some common algorithms used in Supervised Learning?

A

Decision Trees, Support Vector Machines (SVM), Linear Regression, and Neural Networks.

19
Q

What are some applications of Supervised Learning?

A

Image classification, spam detection, and predicting stock prices.

20
Q

What are some applications of Reinforcement Learning?

A

Robotics, game AI (e.g., AlphaGo), and autonomous vehicles.