Lecture 5 - Introduction to Supervised ML Flashcards

Question 1

Q

What is machine learning?

Answer

A

A field of artificial intelligence concerned with algorithm that can learn from data

Question 2

Q

What can ML provide?

Answer

A

Data mining
Self-customizing programs (like recommended pages)
Application one can’t program by hand, like speech recognition

Question 3

Q

What is the essence of ML?

Answer

A

A pattern exists
We can not pin it down mathematically
We have data on it

Question 4

Q

What is supervised learning?

Answer

A

Labels are available for training data, but unknown for future data. The goal is to model dependency between the features and the label.

Question 5

Q

Classification vs Regression for supervised ML?

Answer

A

Classification: gives outputs from finite unordered set C = {c1, c2, … , ck}
Regression: Outputs a real number

Question 6

Q

What is a hypothesis set?

Answer

A

The hypothesis set is the space of functions that we look for our solution. The supervised learning uses data to learn a function g from hypothesis set H, where g: X -> Y

Question 7

Q

What are the modelling steps?

Answer

A

Model class: general structure of the model (linear, quadratic, decision tree, clustering)
Error measure (score function): Evaluates the quality of different models (squared error, variance, complexity)
Algorithm: Find a good model, as defined by score function
Validation: Finding best fit to training data does not guarantee accurate predictions on new data: overfitting and underfitting

Question 8

Q

What is Approximation-Generalizing trade-off?

Answer

A

The aim of learning is to approximate the target function f as closely as possible -> more complex hypothesis set approximates better

but less complex hypothesis set generalizes better

Question 9

Q

Can we have the ideal hypothesis set?

Answer

A

Not really, having the ideal H = {f} means that we already know the answer, so there is no need for ML

Question 10

Q

What is Grue Emerald Paradox?

Answer

A

Suppose that we have seen a large set of emeralds and they are all green. Hypotheses:
1. All emeralds are green
2. All emeralds are blue except the ones we have seen so far.

Based on training set alone, there is no means of choosing which one is better, but on the test data they give completely opposite results

Question 11

Q

What is Occam’s Razor?

Answer

A

Occam’s razor is a principle that favors the simplest hypothesis that can well explain a given set of observations. Better to have a simple hypothesis A than almost as good very complex hypothesis B

Question 12

Q

What does M mean in polynomial curve fitting?

Answer

A

M is the order of polynomial, so for example M = 3 means f(x) = w0 + w1x + w2x^2 + w3x^3. The higher the M is the more curves there will be on the function’s plot

Question 13

Q

What is mean squared error?

Answer

A

It is the average of (predicted value - true value)^2

Question 14

Q

What does linear models describe?

Answer

A

The relationship between attributes y and x

Question 15

Q

How to find the best model?

Answer

A

Understand the project -> restricts the model class
Fitting criterion: how well does the model fit the data?
Model complexity: simpler preferred
Interpretability is desired
Computational aspects

Question 16

Q

What are nearest neighbour predictors?

Answer

Study These Flashcards

A

Similar instances should have similar class-labels or real-valued labels, similarity is defined in terms of distance

Question 17

Q

How is the k-nearest neighbor class/regression decided?

Answer

Study These Flashcards

A

Classification: Choose the majority class among the k nearest
Regression: Choose the mean value of the k nearest

Question 18

Q

What is the weakness of kNN?

Answer

Study These Flashcards

A

Large computational and memory complexity
Sensitivity to feature scaling
Irrelevant features contribute equally
Black box: does not produce model that can be analyzed

Question 19

Q

What are the ingredients for kNN predictor?

Answer

Study These Flashcards

A

Distance metric
Number of neighbours (often chosen based on CV, can’t use smallest amount of errors, k=1 would be overfitting)
weighting function for the neighbours (closer the better)
Predictiton function (majority vote, weighted majority, mean, weighted averaage)

Question 20

Q

What are the advantages of kNN?

Answer

Study These Flashcards

A

Extremely simple, non-linear model
Simple tuning mechanism for complexity (parameter k)
Can be customized with distances, features, weighting etc.

Lecture 5 - Introduction to Supervised ML Flashcards

(20 cards)