AdaBoost Flashcards

1
Q

What is the structure of AdaBoost?

A

An Adaboost classifier is composed of many stump trees, single nodes with two legs, as weak learners.

Stumps can use only one feature to make a decision.

Unlike random forests which weighs all weak bootstrapped trees equally, in an Adaboost forest of stumps, some stumps get more weights in the final classification than others.

While random forest creates weak bootstrapped trees independently, errors in each stump in Adaboost affect the weights of all following stump weights, i.e. each stump is created by taking account of previous stumps’ errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the Adaboost error reweighting process.

A
  1. create a new sample (row) weight feature and init as uniform equal weights across rows, 1 /n samples.
  2. Create the first stump by finding the feature [chest pain, blocked arteries, patient weight] which best classifies the target [heart disease] in training:
                               [chest pain]
                             /                     \
          yes heart disease     no heart disease
        3 correct, 2 incorrect   2 correct, 1 incorrect
                                 gini index = 0.47
    
                          [blocked arteries]
                             /                     \
          yes heart disease     no heart disease
        3 correct, 3 incorrect   1 correct, 1 incorrect
                              gini index = 0.5
    
                      [patient weight > 176]
                             /                     \
          yes heart disease     no heart disease
        3 correct, 0 incorrect   4 correct, 1 incorrect
                             gini index = 0.2

Gini index for patient weight is lowest so Adaboost selects patient weight as first stump in forest.

Next we assign stump weight in final majority decision based on stump’s training error rate:

stump amount of say = 0.5 * log(1-total error) /(total error)

patient weight error rate was 1/8, thus amount of say is:
1/2 * log (7/8) / 1/8) = 1/2 log 7 = 0.97

When a stump does a good job of classifying, then the amount of say is a relatively large value.

When a stump does no better than coin flip, then amount of say will be assigned zero.

When the stump does a terrible job, say consistently misclassified all samples resulting in error rate = 1, then the amount of say is assigned a large NEGATIVE value.

KEY: so if a stump clf consistently misclassified samples as “yes heart disease”, then the negative weight will flip this stump’s vote to “no heart disease”.

Now, we need to know how to modify wts so that the next stump will take the errors that the current stump made into account.

If a stump incorrectly classified a sample in last round, then we will emphasize a need for the next stump to classify it by INCREASING its sample weight while DECREASING all of the other sample weights.

note: new sample weight is not same as final vote amount of say, but they are interrelated as new sample weight uses amount of say in its calc.

new weight for misclassified sample = sample weight * exp(amount of say)

When the amount of say was large/small (i.e. stump did good/bad job), then we scale previous sample weight with a large/small number. This means the new sample weight will be much larger/a little larger than the old weight.

new misclf’d sample weight = 1/8 * exp(amount of say)
= 1/8 * exp(0.97) = 1/8 * 2.64 = 0.33

notice 0.33&raquo_space; 1/8 initial weight, so we upweighted the misclf’d sample

now, we decrease sample weights for CORRECTLY clf’d samples:

new correct clf’d sample weights = sample weight * exp(-amount of say)

notice the negative sign in front of amount of say (which is imposed to reduce the sample wt, unrelated to actual amount of say sign) weight. A graph of exp(-amount of say) is downward curved and asymptotically approaches 0 as amount of say -> inf, thus weight is decreased.

If the sample weight for the last stump is relatively small, then we scale the sample weight BY A VALUE CLOSE TO 1, which means the new sample weight will be a little smaller than the old weight.

new sample weight that was correctly clf’d = 1/8 * exp(-amount of say)
= 1/8 * exp(-0.97) = 1/8 * 0.38 = 0.05

notice new sample weight of 0.05 &laquo_space;1/8

In theory, we could use the sample weights to calculate the wtd Gini indexes to determine which variable to split the next stump. The wtd Gini index would put more emphasis on correctly classifiying this last sample (the one that was misclassified by last stump), since this sample has largest sample weight.

the final majority vote sums the weak stump votes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What formula in Adaboost updates the “amount of say” of a given weak learner stump?

A

stump “amount of say” =:

1/2 log(1- total err) / (total err)

where the total err (float) is proportion of misclassified points.

eg. say a stump had 3/8 err rate, then 1/2 log(1 -.375) / .375 = 1/2 ln (5/8) / (3/8) = .255

The “amount of say” function appears as follows:

total err = 0, then amount of say > 3
total err = 0.5, then amount of say = 0.5
total err = 1.0, then amount of say < -3

i.e. when a stump classifies correctly, Adaboost upweights the stump and when a stump classifies poorly, Adaboost REDUCES the weight of a stump.

The shape of this function is a sigmoid, rotated counter clockwise so that the lhs (rhs) asymptotically approaches inf (-inf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Adaboost reweight correct and incorrectly classified stumps for the next round?

A

for correct clf:

New sample weight =: sample weight * exp(amount of say)

i.e. the new sample wt formula INCREASES the sample wt for a sample that was INCORRECTLY clf’d.

This function looks like the rhs of a poly2 parabola centered at zero, with x = amt of say, y = exp(amt of say), thus any stump that did WELL is weighted HIGHER.

for incorrect clf:

New sample weight =: sample weight * exp(-amount of say)

This func looks like downward sloping parabola, with x = amt of say, y = exp(- amt of say), y ~< 1 when x = 0 and y~>0 when x approaches 3.5. Thus, any MISCLF’D stump is DOWN weighted. because the func is always <1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly