Supervised Learning Flashcards
Bias error
Model does/can not correctly represent the concept (underfit)
Variance error
Model specializes in training set (overfit)
Regularization (favoring smoother
functions, output varies slowly with input) helps to mitigate the variance error
Multilinear Regression assumes
- Relation between xi and y is linear
- All variables (x) have Normal distributions
- Variables are independent and residual / error is constant
The input of an artificial
neuron:
Comes from all neurons of the previous layer or it is an external
input
The output of an artificial neuron
Is sent to all neurons of
the next layer or is (part of) the network output
Backpropagation
- Present each example (x(i),d(i))
- Calculate network response x(i) : f(x(i))
- Propagate error backwards (iteratively building error
derivative at each layer) - Save partial derivatives
- After all examples processed, update weights
Artificial Neural Networks are
- Robust to noise and approximations
- Based in a simplified model of a neuron
- Support incremental training
- Compress information of many examples in a small model
Deep Learning
Alternating prediction layers with feature detection and decorrelation
Deep Learning - network structure
- Convolutional layers: apply convolutions to get the feature maps
- Pooling (sub-sampling) layers: reduce feature maps’ dimensions (combine features and/or decorrelate)
- Dense layers – similar to the “hidden” layers on a classical neuronal network
kNN problems
- Define distance
- Define class selection
- Non-linear problems
A set has the largest entropy if
each of its elements belongs to a
different class
PlayTennis(no/yes) - entropy(S)
− ( P(no) x log2 (P(no)) + P(yes) x log2 (P(yes)) )
Decision Tree - the best split
The best split is the split that results in the largest entropy reduction, that is, the largest information
gain (IG)
Decision Tree - C4.5 / C5.0
Similar to ID3, but. . .
* Support for continuous attributes - discretizes continuous attributes
* Allows missing values - examples not used when calculating entropy
* Allows different costs for attributes
* Prunning
Learning ensembles
Boosting (Kearns 88)
* Can a set of weak learners create a single strong
learner?
* Classification combines the results of all the subtrees
* Misclassified examples become more important for
the error in each iteration
* New trees are trained to fit the residual error
Bagging - Bootstrap aggregating: (Breiman 96)
* Selects randomly the subsets
* Trains several learners,
* Classification by voting, regression by averaging