Classification: Linear Models I Flashcards
What is the difference between classification and regression?
In regression problem, we’re trying to predict the value of a continues valued function. Whereas in classification, we try to find the correct class label for the given input. Training from the input data is common in both tasks.
Example of classification:
https://gyazo.com/7666548de81ed8f51209bf49a9620c07
Example of regression:
Forklar K-nearest neightbors
Principle: near neighbors tend to have the same label
See figure
On the left, the nodes are only evaluating themselves.
But in the right figure we can see that the nodes are not always correct, but a more even split is made.
Picking the best value for K is hard and depends on the data
KNN is not a linear model
Explain the perceptron
Neural Network with:
Input layer
No hidden layers
One output neuron
Sign activation function at output neuron
What can a perceptron be used for?
The perceptron can be used to classify a binary class variable based on (numeric) predictor attributes A1, . . . , An.
What are the lemitations of a perceptron?
The perceptron can not produce the following classification (XOR of A and B):
Describe overfitting
Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if there exists some alternative hypothesis h ′ ∈ H, such that h has smaller error than h ′ over the training examples, but h ′ has a smaller error than h over the entire distribution of instances [Mitchell, p. 67].
It can however be hard not to overfit!
https://gyazo.com/8a8d06ebc0b3dc3fbdee52bfd977e007
Generelly we can say the following:
- the hypothesis space (or model space) can be structured by some model parameter (k in KNN or hidden layers in an NN etc.)
- different parameter values lead to more or less complex decision regions (ex. KNN: small k -> complex regions)
- a hypothesis h overfits, if its decision regions are too closely fitted to the training data.
4.
What are the advantages of linear models?
Benefits
- Unlikely to overfit
- Easy to learn from data
- Well understood
Diversity
- Different types of linear classification models are based on different objective functions that are optimized in learning
- Perceptron: minimize an error function*
- Naive Bayes: maximize likelihood function*
- provide different learning methods/algorithms,
- return different results!
Describe the least squares regression
We want to find the line that gives the least sum of areal of the squares we can make from a datapoint directly to the line. The following video shows it in detail (short)
https://www.youtube.com/watch?v=jEEJNz0RK4Q
The squares represents the amount of squared errors
Are there limitations for least squares regression?
Yes!
For some data sets, least squares regression, may be faulty!
see picture
What are the approaches to use linear functions for classification with more than two different class labels?
Multiple binary “one against all” classifications
and
Multiple binary “one against one” classifications
How can we handle undecidable zones for multiple classes?
We can maybe use a linear Discriminant Function