08 - Evaluation Flashcards

Question 1

Q

What do you need to know to assess how good or meaningful results are?

Answer

A

What type of error was used
Which data set and how it was divided
Scale
Context, how do other algorithms perform

Question 2

Q

What can you use as a baseline? What are other algorithms that you can compare your own method with?

Answer

A

State of the art algorithms or algorithm used so far
Simple algorithms (linear regression)
Mean or median
Highest Class Probalility (Modal)
Some simple rules
Random

Question 3

Q

Why do you need a baseline?

Answer

A

Without a baseline, performance evaluations of an algorithm are typically of little or no relevance
A baseline gives meaning to the results

Question 4

Q

What is the ground truth for recommender systems?

Answer

A

Ratings, submitted ratings, relevance scores of a dataset
These are considered “true” but may well be false, biased, sparse or noisy

Question 5

Q

What are the problems with the ground truth of recommender systems?

Answer

A

Real ground truth is difficult to measure
Ground truth is derived/approximated
Is the best possible that is available
Hard to find

Question 6

Q

What is called the Gold Standard?

Answer

A

Something is the best available thing you can get

Question 7

Q

What is the assumption of the Central Limit Theorem?

Answer

A

Large number of examinations
Large random sample with n examinations
Samples are random (independent of the previous examination)

Question 8

Q

What is the Central Limit Theorem?

Answer

A

Mean (and sum) of the samples follows a normal distribution
The larger n, the closer the mean and sum of the samples approach the true values

Question 9

Q

What is statistical significance?

Answer

A

Describes the probability that an observed difference is caused by chance
The typical p value should be less than 0.05 or 0.01
Statistically significant results can still be false or practically insignificant

Question 10

Q

What does statistical significance mean?

Answer

A

Experimental data giving a p value of 0.05 means that there is only a 5% chance of getting the observed result if no real effect exists
The p value provides information about the probability of obtaining evidence. It does not quantify the strength of the evidence

Question 11

Q

What is called P-hacking?

Answer

A

If you torture your data long enough, they will confess

Question 12

Q

Why is it important to analyze performance over time?

Answer

A

Standard stupid assumption: performance is always the same over time

Question 13

Q

What is called dataset pruning?

Answer

A

Remove data that does not fit your intention

Question 14

Q

When is it good to remove data?

Answer

A

Wrong data
Noisy data
Missing data

08 - Evaluation Flashcards

(14 cards)