05 - Recommender Systems Evaluation Flashcards
What questions should you ask yourself is you develop a recommender system?
- Objective: What do you want to achieve with the model?
- How to measure: Evaluation methods and Evaluation metrics
- How good/relevant are the results?
What are the goals of the business world?
- A successful business
- Maximum profit, income, and user satisfaction
- Minimize costs
- Get as many users as possible
- To have the best product
What are possible costs, that may arise?
- Labour costs
- Server
- Legal/Licenses
- etc.
What is Goodhart’s Law?
When a measure becomes a target, it ceases to be a good measure (dt. wenn ein Messwert zu einem Ziel wird, ist es kein geeigneter Messwert)
What are the three main evaluation methods and metrics?
- Online Evaluations
- Offline Evaluations
- User Studies
What is part of Online Evaluations?
- Sales
- Profit
- Clicks
What is part of Offline Evaluations?
- Errors
- Accuracy
What is part of User Studies?
- User feedback
- User observations
How does an A/B Test work?
- Typical Online Test
- 50% of the users see Variante A
- 50% of the users see Variante B
How does Interleaving work?
- Randomize Rankings
- All kinds of variations (Random Mix, Top n Mix, Fixed amount Mix)
What is a typical metric for classification?
Accuracy
What is a typical metric for Regression?
Error Metrics
What is a typical metric for Ranking?
Ranking Metrics
Is Regression = Classification?
- Regression tasks can be interpreted as classification/ranking problem
- Define intervals and treat them as classes (and use a classification algorithm instead of regression algorithm)
What regression metrics do you know?
- Mean Absolute Error (MAE)
- (Root) Mean Square Error ((R)MSE)
What is Mean Absolute Error (MAE)?
Average Error (Mittelwert) between prediction and observation
What is the benefit of Mean Absolute Errors (MAE)?
Intuitive
What is the drawback of RMSE?
- Not very intuitive
- Punishes high error rates more
What (Ranked) Retrieval Metrics do you know?
- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalized Discounted Cumulative Gain (nDGC)
What is Mean Reciprocal Rank (MRR)?
- Measures at which rank the first relevant result is displayed
- Takes care of the first relevant result only
What is Normalized Discounted Cumulative Gain (nDGC)?
Relevant items are ranked higher than less relevant items
In which steps can the Normalized Discounted Cumulative Gain (nDGC) be divided?
- Step 1: Cumulative gain = Sum of relevance of the top n items
- Step 2: Discounted Cumulative Gain: Punishes relevant items, that are less ranked
- Step 3: Normalized Discounted Cumulative Gain: Normalises DCG on interval 0 to 1
What is Effectiveness?
Die richtigen Sachen machen (Do the right things )
What is Efficiency?
Sachen richtig machen (Do things right)
What is Performance?
- Sometimes synonym for Effectiveness
- Sometimes used as generic term