Image classification Flashcards

1
Q

Name some properties of good features in images

A
  1. Distinct and discriminative
  2. Local
  3. Invariant to translations and transformations
  4. Invariant to brightness change
  5. Efficient to compute
  6. Robust to noise/ blurring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can the pixel values of a patch be calculated using the integral image

A

Bottom right corner - bottom left - top right + top left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the idea behind ensemble learning.

A

Aggregate the results of several predictors into one prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In what case will ensamble learning not lead to improved results over one predictor.

A

If all predictors are higly correlated, so they give the same output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of error do sequential parallel learners reduce?

A

Sequential learners reduce mostly bias, while parallel reduce variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name 4 different ensamble methods, and which of them are parallel/ sequential.

A

Parallel:
Bagging ( Bootstrap aggregation)
Voting
Random forrests

Sequential:
Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does Bagging work?

A

Create N versions of the training set, using sampling with replacement and train a weak learner on each one. Use Averaging for regression, and majority voting for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Out-Of-Bag error for bagging?

A

For each training sample, xi:

  1. Find the predicition, y_hat_i, of all classifiers not containing xi in the training set.
  2. Average y_hat_i
  3. repeat and average for all training samples.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is boosting?

A

We compute classifiers sequential, by increasing the weight of missclassified examples each run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the adaboost algorithm

A
  1. Initialize all weights equally
  2. Training subsets are bootstrapped from the full dataset using weighted sampling
  3. Fit a classifier to the new set
  4. Up-weigh missclassified weights, down-weigh the rest and repeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we calculate the probability that a decision tree prediction is correct?

A

We know how many samples from each class of the test set each node(prediction) contained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the decision tree optimization for creating a feature space partition

A
At each node Sj
  for each feature
    for each value of this feature:
      evaulate I(Sj, Aj)
 chose the best feature and value for splitting
reapeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two choices for categorizing tree optimization cost functions, I(Sj, Aj)?

A
  1. Information Gain

2. Gini Index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Desribe the information gain cost function, I(Sj, Aj)

A

I(Sj, Aj) = Entropy parent - weighted average entropy of children.
Entropy = - sum p(xj) log(p(xj))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the Gini Index

A

Indicates how mixed classes are, perfect seperation results in score 0, 50/50 seperation results in score 0.5.

Gini = 1 - sum (p(yk))**2 
Final = weighted average of the Ginis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What loss do we normally use for regression trees?

A

weighted average of MSE

17
Q

What can we do to prevent overfitting in decision trees?

A

Combine them to an esamble (Forests).

18
Q
For random forrests, how does the:
1. Number of features selected per node
2. Number of trees
3. Max depth
Affect bias and variance?
A
  1. More features will increase variance, but might reduce bias.
  2. More trees will reduce variance
  3. Deeper trees will increase variance, but might reduce bias
19
Q

What is the difference between a random forrest and a boosting algorithm with decision trees as weak learners?

A

The trees differ as the splitting is randomized.

20
Q

Name some advantages of decision trees and dissadvantages of decissions trees?

A

Advantages:

  1. Explainable model
  2. Can handle multi class problems
  3. Can handle categorical and continous variables
  4. Requires little preprossesing
  5. The cost is logarithmic in the number of samples

Disadvantages:

  1. Prone to overfitting
  2. Biassed towards classes with more datapoints