Image classification Flashcards
Name some properties of good features in images
- Distinct and discriminative
- Local
- Invariant to translations and transformations
- Invariant to brightness change
- Efficient to compute
- Robust to noise/ blurring
How can the pixel values of a patch be calculated using the integral image
Bottom right corner - bottom left - top right + top left
What is the idea behind ensemble learning.
Aggregate the results of several predictors into one prediction
In what case will ensamble learning not lead to improved results over one predictor.
If all predictors are higly correlated, so they give the same output.
What kind of error do sequential parallel learners reduce?
Sequential learners reduce mostly bias, while parallel reduce variance.
Name 4 different ensamble methods, and which of them are parallel/ sequential.
Parallel:
Bagging ( Bootstrap aggregation)
Voting
Random forrests
Sequential:
Boosting
How does Bagging work?
Create N versions of the training set, using sampling with replacement and train a weak learner on each one. Use Averaging for regression, and majority voting for classification.
What is Out-Of-Bag error for bagging?
For each training sample, xi:
- Find the predicition, y_hat_i, of all classifiers not containing xi in the training set.
- Average y_hat_i
- repeat and average for all training samples.
What is boosting?
We compute classifiers sequential, by increasing the weight of missclassified examples each run.
Describe the adaboost algorithm
- Initialize all weights equally
- Training subsets are bootstrapped from the full dataset using weighted sampling
- Fit a classifier to the new set
- Up-weigh missclassified weights, down-weigh the rest and repeat
How can we calculate the probability that a decision tree prediction is correct?
We know how many samples from each class of the test set each node(prediction) contained.
Describe the decision tree optimization for creating a feature space partition
At each node Sj for each feature for each value of this feature: evaulate I(Sj, Aj) chose the best feature and value for splitting reapeat
What are the two choices for categorizing tree optimization cost functions, I(Sj, Aj)?
- Information Gain
2. Gini Index.
Desribe the information gain cost function, I(Sj, Aj)
I(Sj, Aj) = Entropy parent - weighted average entropy of children.
Entropy = - sum p(xj) log(p(xj))
Describe the Gini Index
Indicates how mixed classes are, perfect seperation results in score 0, 50/50 seperation results in score 0.5.
Gini = 1 - sum (p(yk))**2 Final = weighted average of the Ginis