Performance Evaluation Flashcards

Question 1

Q

What types of evaluation are there (3)

Answer

A

performance
adequacy
diagnostic

Question 2

Q

What is performance evaluation (3)

Answer

A

based on a benchmark
organised around community/shared task
automated means of scoring

Question 3

Q

key points about gold standard data (3)

Answer

A

time consuming and costly
requires annotation guidelines to follow
annotation done by experts

Question 4

Q

why do we use multiple annotators

Answer

A

to ensure reliability

Question 5

Q

what does the kappa coefficient measure here

Answer

A

inter annotator agreement

Question 6

Q

how do we calculate kappa coefficient

Answer

A

(p(a) - p(e)) / 1 - p(e)

Question 7

Q

p(a) = …

Answer

A

observed agreement

p(a1=y, a2=y) + p(a1=n, a2=n)

Question 8

Q

p(e) = ..

Answer

A

expected agreement

p(a1=y)p(a2=y) + p(a1=n)p(a2=n)

Question 9

Q

how do we interpret the kappa coefficient

Answer

A

slight < 0.2 < fair < 0.4 < moderate < 0.6 < substantial < 0.8 < perfect

Question 10

Q

what can we use in the non binary annotation case

Answer

A

scotts pi, fleiss kappa

Question 11

Q

precision =

Answer

A

TP / TP+FP

Question 12

Q

recall =

Answer

A

TP / TP+FN

Question 13

Q

f1 =

Answer

A

2PR / R+P

Question 14

Q

why is f1 score more informative than mean

Answer

A

it is the harmonic mean- it will show poor performance e.g. if prediction is always no

Question 15

Q

when is accuracy useful

Answer

A

if all classes are equally important

Question 16

Q

what kinds of averages can we use when we have multiple categories

Answer

Study These Flashcards

A

macro average, micro average

Question 17

Q

what is macro average

Answer

Study These Flashcards

A

take the average

Question 18

Q

what is micro average

Answer

Study These Flashcards

A

pool tps, fps and fns. less sensitive to class imbalance

Question 19

Q

what is olympic judging

Answer

Study These Flashcards

A

if there is not enough data for a gold standard, a committee of judges determines whether a proposal is relevant and close to the desired result.

not reproducible

Question 20

Q

what is adequacy evaluation

Answer

Study These Flashcards

A

evaluation as seen by users, not quantifiable and interdependent. Judging the external quality

Question 21

Q

what are some of the factors for adequacy evaluation

Answer

Study These Flashcards

A

adaptability, integrity, efficiency, robustness, correctness, reliability, usability, accuracy

Question 22

Q

what is diagnostic evaluation

Answer

Study These Flashcards

A

concerned with evaluation as seen by developers

Question 23

Q

what are some of the factors for diagnostic evaluation

Answer

Study These Flashcards

A

profitability, reusability, maintainability, testability, understandability, flexibility, readability

Question 24

Q

why cant we use a NLP test suite

Answer

Study These Flashcards

A

the range of phenomena is hard to anticipate

Performance Evaluation Flashcards

(24 cards)