Lecture 5 - Introduction to Supervised ML Flashcards
What is machine learning?
A field of artificial intelligence concerned with algorithm that can learn from data
What can ML provide?
Data mining
Self-customizing programs (like recommended pages)
Application one can’t program by hand, like speech recognition
What is the essence of ML?
- A pattern exists
- We can not pin it down mathematically
- We have data on it
What is supervised learning?
Labels are available for training data, but unknown for future data. The goal is to model dependency between the features and the label.
Classification vs Regression for supervised ML?
Classification: gives outputs from finite unordered set C = {c1, c2, … , ck}
Regression: Outputs a real number
What is a hypothesis set?
The hypothesis set is the space of functions that we look for our solution. The supervised learning uses data to learn a function g from hypothesis set H, where g: X -> Y
What are the modelling steps?
- Model class: general structure of the model (linear, quadratic, decision tree, clustering)
- Error measure (score function): Evaluates the quality of different models (squared error, variance, complexity)
- Algorithm: Find a good model, as defined by score function
- Validation: Finding best fit to training data does not guarantee accurate predictions on new data: overfitting and underfitting
What is Approximation-Generalizing trade-off?
The aim of learning is to approximate the target function f as closely as possible -> more complex hypothesis set approximates better
but less complex hypothesis set generalizes better
Can we have the ideal hypothesis set?
Not really, having the ideal H = {f} means that we already know the answer, so there is no need for ML
What is Grue Emerald Paradox?
Suppose that we have seen a large set of emeralds and they are all green. Hypotheses:
1. All emeralds are green
2. All emeralds are blue except the ones we have seen so far.
Based on training set alone, there is no means of choosing which one is better, but on the test data they give completely opposite results
What is Occam’s Razor?
Occam’s razor is a principle that favors the simplest hypothesis that can well explain a given set of observations. Better to have a simple hypothesis A than almost as good very complex hypothesis B
What does M mean in polynomial curve fitting?
M is the order of polynomial, so for example M = 3 means f(x) = w0 + w1x + w2x^2 + w3x^3. The higher the M is the more curves there will be on the function’s plot
What is mean squared error?
It is the average of (predicted value - true value)^2
What does linear models describe?
The relationship between attributes y and x
How to find the best model?
- Understand the project -> restricts the model class
- Fitting criterion: how well does the model fit the data?
- Model complexity: simpler preferred
- Interpretability is desired
- Computational aspects