Supervised Learning Flashcards

1
Q

What is supervised learning?

A

A subcategory of M.L. defined by the use of labeled input/output sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between regression and classification?

A

Regression is used to predict continuous values such as price or income. The goal is to find a best-fit line. Classification is used to predict a discrete class label, goal: decision boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of problems can you solve with classification and regression?

A

Regression: weather prediction, housing price prediction
Classification: spam detection, speech recognition, cancer cell identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is training set error performance unreliable?

A

Doesn’t generalize to unseen data. Perfect training set performance equals overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine learning?

A

A field of artificial intelligence concerned with algorithms that can learn from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Two main branches of Machine Learning?

A

Supervised learning
Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two main branches of Machine Learning?

A

Supervised learning
Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 requirements for machine learning?

A

1) A pattern exists
2) that cannot be pinned down mathematically
3) We have data on it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define data (for M.L)

A

Input - correct output pairs (feature, label)
input - real-valued or categorial
output - real-valued (regression) or categorical (classification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Goal of supervised learning?

A

To model dependency between features and labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Goal of a supervised learning model?

A

To predict labels for new instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a training set?

A

A set of input - output pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification output value types?

A

Categorical or binary (-1,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression output value type?

A

Real numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Examples of supervised learning problems?

A

Junk mail:
features - word frequencies
class - junk/not junk

Access Control System:
features - images
class - ID of the person

Medical diagnosis:
features: BMI, age, symptoms, test results
class: diagnostic code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formal components of learning.

A

Input (x) - e.g. customer application
Output (y) - (approval/denial of application)
Target function: x -> y (ideal credit approval formula)
Data {(x1, y1), … (xn, yn)} (historical records)
Hypothesis: g; X -> Y
Hypothesis set (H): group of functions where we look for our solution
Supervised learning uses test data to learn this function from H that can be applied to new data.

17
Q

Building blocks for an M.L. algorithm?

A

Model class (hypothesis set) e.g.
-linear or quadratic function
-decision tree
-neural network, clustering
Error measure (Score function)
Algorithm - good model defined by the score function
Validation

18
Q

Dangers of overfitting

A

The model memorizes training data and does not generalize beyond it.
100% accuracy on training data, can’t do better than random guessing on new instances.

19
Q

Dangers of underfitting

A

Model not expressive enough, for ex. linear functions on non-linear problems.

20
Q

Approximation-Generalization tradeoff

A

Goal: to approximate target function as closely as possible.
More complex hypothesis set: better chance of approximating target function f
Less complex hypothesis set: better chance of generalizing f outside of the training set

21
Q

Ideal hypothesis set H

A

H = {f}, we already know the target function, no need for M.L.

22
Q

Occam’s Razor

A

the principle that favors the simplest hypothesis (set) that can well explain a given set of observations.

23
Q

Criteria for a good model

A

Interpretability
Computational complexity

24
Q

How to control Hypothesis set complexity?

A

With hyperparameters.
-max degree of polynomials
-no of nearest neighbors
-regularization parameter
-depth of decision tree

25
Q

What kind of methods should you start with?

A

Simple: linear regression, kNN, naive bayes
- easier to understand
-less tuning, less risk of overfitting
-often just as good as more advanced methods.

26
Q

K Nearest Neighbors

A

-Classic method (1951)
-Classification based on k most similar training instances
-parameter k tunes model complexity
-can learn complex non-linear functions

27
Q

kNN Classification

A

Choose the majority class among k nearest neighbors for prediction.

28
Q

kNN Regression

A

Take the mean value of k nearest neighbors for prediction

29
Q

Disadvantages of kNN predictor

A

All k nearest neighbors have the same influence on prediction. Maybe closer neighbors should have more influence?

30
Q

Distance measure in kNN

A

Standard: euclidean distance
Others: Manhattan, Mahalanobis, Chebyshev, Hamming

31
Q

Small vs large k in kNN

A

small: local complex model, depends on a handful of instances
large: global, simpler model, averaged over large set of instances

32
Q

kNN, k=1?

A

Overfitting! 0% training error, but won’t generalize

33
Q

Advantages of kNN

A

-simple
-non-linear modeling
-simple model complexity tuning (k)
-customizable (distance measure, feature/neighbor weighing)
-good results in many applications

34
Q

Disadvantages of KNN

A

-Large computational/memory complexity (O(nm) where m is the dimensionality of the data)
-sensitive to scaling
-Irrelevant features problematic
-black box-
-not state of the art

35
Q

Criteria to be balanced in learning.

A

Fit to data (low error) vs model complexity

36
Q

4 main ingredients of a kNN algorithm.

A

Distance metric
Number of neighbors (k)
Weighting function for neighbors
Prediction function

37
Q

Method to automatically determine the appropriate k value for kNN.

A

Cross-validation.