9. Statistical Debugging Flashcards
Problem with random statistical testing
too slow
every kth site issue
violates independence
might miss predicates “out of place”
periodic hardware time or interrupt issue
might miss rare event
not portable across systems
amortized coin tossing
implement as countdown values selected from geometric distribution
F(P) is
S(P) is
# of failing runs where P is observed to be true # of successful runs where P is observed to be true
Failure(P) =
F(P)+S(P)
Context Statistic
background chance of failure regardless of P’s value
F(P observed) =
S(P observed) =
# of failing runs observing P # of successful runs observing P
Context(P) =
F(P observed) + S(P observed)
Increase()
Does P being true increase chance of failure over the background rate?
Increase(P) =
Failure(P) - Context(P)
Finding Bugs First Algorithm
- Discard predicates having Increase(P)<=0
2. Sort remaining predicates by Increase(P)
Revised Algorithm
- Compute Increase() for all predicates
- Rank predicates
- Add top ranked predicate P to the results list
- Remove P and discard all runs where P is true
sub-bug predictor
covers special cases of a more general bug
Ranking by Increase(P) problem
high increase() scores but few failing runs
Ranking by F(P) problem
many failing runs but low increase() scores
Precision =
fraction of retrieved instances that are relevant
recall =
fraction of relevant instances that are retrieved
retrieved instances
predicates reported as bug predictors
relevant instances
predicates that are actually bugs
It (is)/(is not) trivial to achieve only high precision or only high recall
is
Increase(P) has ___ precision, ___ recall
high precision
low recall
F(P) has ___ precision, ___ recall
high recall
low precision
Combining Precision and Recall: Harmonic Mean
1/Increase(P) + 1/F(P)
Rewards high scores in both dimensions
sorting by the harmonic mean
does good