Lecture 5 - Introduction to Supervised ML Flashcards

1
Q

What is machine learning?

A

A field of artificial intelligence concerned with algorithm that can learn from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can ML provide?

A

Data mining
Self-customizing programs (like recommended pages)
Application one can’t program by hand, like speech recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the essence of ML?

A
  1. A pattern exists
  2. We can not pin it down mathematically
  3. We have data on it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is supervised learning?

A

Labels are available for training data, but unknown for future data. The goal is to model dependency between the features and the label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification vs Regression for supervised ML?

A

Classification: gives outputs from finite unordered set C = {c1, c2, … , ck}
Regression: Outputs a real number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a hypothesis set?

A

The hypothesis set is the space of functions that we look for our solution. The supervised learning uses data to learn a function g from hypothesis set H, where g: X -> Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the modelling steps?

A
  1. Model class: general structure of the model (linear, quadratic, decision tree, clustering)
  2. Error measure (score function): Evaluates the quality of different models (squared error, variance, complexity)
  3. Algorithm: Find a good model, as defined by score function
  4. Validation: Finding best fit to training data does not guarantee accurate predictions on new data: overfitting and underfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Approximation-Generalizing trade-off?

A

The aim of learning is to approximate the target function f as closely as possible -> more complex hypothesis set approximates better

but less complex hypothesis set generalizes better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can we have the ideal hypothesis set?

A

Not really, having the ideal H = {f} means that we already know the answer, so there is no need for ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Grue Emerald Paradox?

A

Suppose that we have seen a large set of emeralds and they are all green. Hypotheses:
1. All emeralds are green
2. All emeralds are blue except the ones we have seen so far.

Based on training set alone, there is no means of choosing which one is better, but on the test data they give completely opposite results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Occam’s Razor?

A

Occam’s razor is a principle that favors the simplest hypothesis that can well explain a given set of observations. Better to have a simple hypothesis A than almost as good very complex hypothesis B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does M mean in polynomial curve fitting?

A

M is the order of polynomial, so for example M = 3 means f(x) = w0 + w1x + w2x^2 + w3x^3. The higher the M is the more curves there will be on the function’s plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is mean squared error?

A

It is the average of (predicted value - true value)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does linear models describe?

A

The relationship between attributes y and x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to find the best model?

A
  1. Understand the project -> restricts the model class
  2. Fitting criterion: how well does the model fit the data?
  3. Model complexity: simpler preferred
  4. Interpretability is desired
  5. Computational aspects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are nearest neighbour predictors?

A

Similar instances should have similar class-labels or real-valued labels, similarity is defined in terms of distance

17
Q

How is the k-nearest neighbor class/regression decided?

A

Classification: Choose the majority class among the k nearest
Regression: Choose the mean value of the k nearest

18
Q

What is the weakness of kNN?

A
  • Large computational and memory complexity
  • Sensitivity to feature scaling
  • Irrelevant features contribute equally
  • Black box: does not produce model that can be analyzed
19
Q

What are the ingredients for kNN predictor?

A
  1. Distance metric
  2. Number of neighbours (often chosen based on CV, can’t use smallest amount of errors, k=1 would be overfitting)
  3. weighting function for the neighbours (closer the better)
  4. Predictiton function (majority vote, weighted majority, mean, weighted averaage)
20
Q

What are the advantages of kNN?

A
  • Extremely simple, non-linear model
  • Simple tuning mechanism for complexity (parameter k)
  • Can be customized with distances, features, weighting etc.