Recommender System Flashcards
What is “Top-N recommenders”? Give 3 Examples
Produce a finite list of the best things to present to a given person.
What is Explicit feedback? Give 3 Examples
Directly ask for users’ interest.
Example: Rating, Reviews, Like/Dislike
What is the disadvantage of Explicit Feedback?
Humans can get tired of providing so much feedback, and will not reliably provide accurate or truthful feedback
What is the advantage of Explicit Feedback?
Clear preferences: Explicit feedback provides clear indications of user preferences and opinions. Interpretable: Ratings and reviews are straightforward to interpret and incorporate into recommendation models
What is Implicit feedback? Give 3 Examples
Look at users’ behaviours and interpret them as indications of interest or disinterest.
Example: Purchase History, Click-Through Rate (CTR), Viewing Time
What is the disadvantage of Implicit Feedback?
Lack of explcit preferences: Implicit feedback might not precisely reflect users’ true preferences, leading to potential ambiguity.
Difficulty in interpreting: Understanding the exact meaning of implicit signals can be challenging.
What is the advantage of Implicit Feedback?
Abundant data: Implicit feedback is often more abundant than explicit feedback, as it is generated passively during user interactions.
Less user effort: Users don’t need to explicitly rate items, making implicit feedback collection less intrusive.
List some accuracy metrics to evaluate a recommender system.
Examples: Mean Absolute Error, Root Mean Square Error.
Why do accuracy metrics matter?
It provides an offline and immediate evaluation methods for tuning, comparing and benchmarking the model.
However, accuracy and rating predictions have limited value in practice, as the goal is not to minimize the error, but to find what people like. Fundamentally, it’s impossible to evaluate the value of a recommender system offline.
What is Hit Rate?
the proportion of users for whom the system made at least one relevant recommendation.
Why does Hit Rate Matters?
A higher Hit Rate indicates that a larger portion of users received recommendations they found useful, contributing to overall user satisfaction.
What are considerations of using Hit Rate?
The larger the list of recommending items, the higher the hit rate is. Therefore, choose a reasonable value for L
What is Average reciprocal hit rate (ARHR)?
a hit but it accounts for where in the top end list the hit appear.
Why does Average reciprocal hit rate (ARHR) matter?
A higher ARHR score implies that the system is effective in identifying and placing relevant items at the forefront of the recommendation list.
What is Cumulative hit rate (cHR)?
Add a threshold for the test set so that we test on items with acceptable ratings.
Why does Cumulative hit rate (cHR) matter?
We shouldn’t get credit for recommending items to a user that we think they won’t actually enjoy
What is Coverage?
the % of (user, item) pairs that can be predicted or percentage of possible recommendation that the recommender system can provide.
Why does Coverage matter?
The ability of the recommender system to recommend all items from a train set to users.
What are considerations of using Coverage?
There can be a trade-off between coverage and accuracy. A system with the ability to recommend all items to a user is no different than a random prediction.
What is Diversity?
How broad a variety of the recommending items. (1-S), in which S is the average similarity between recommendation pairs.
Why does Diversity matter?
Avoid repetitiveness and discover user’s new interests
What are considerations of using Diversity?
High diversity is not always good. You can achieve very high Diversity by just recommending completely random items.
What is Novelty?
A measure of how popular the recommending items.
Why does Novelty matter?
Novelty can help to increase user engagement and satisfaction
What are considerations of using Novelty?
Need to find a balance between finding familiar popular items and the serendipitous discovery of new items.
What is Long tail?
There will be always an exponential distribution where most sales come from a very small number of items, but long tail also makes up a large amount of sale
Why does Long tail matter?
Recommender systems can help people discover those items in the long tail that are relevant to their own unique niche interests. If you can do that successfully, then the recommendations your system makes can help new authors get discovered, can help people explore their own passions, and make money for whoever you’re building the system for as well.
What is Responsiveness?
How quickly does new user behavior influence the recommendations
Why does Responsiveness matter?
The faster the responsiveness allows the system easily catch up to the current trends and patterns.
What are considerations of using Responsiveness?
The trade off is complexity and responsiveness
What is Perceived Quality?
straight up ask your users if they think specific recommendations are good.
Why does Perceived Quality matter?
Explicit Feedback for the recommender system
What are considerations of using Perceived Quality ?
Noisy data as there is no clear indicate/standard for a good recommendation
What is A/B Testing?
Put recommendations from different algorithms in front of different sets of users and measure how they react to the presented recommendations
Why does A/B Testing matter?
One of the best way to tune the recommender system. Result of online A/B test is emphasized as the most matter metric.
What are considerations of using A/B Testing?
Complex and expensive to execute and maintain.
What is Content-based filtering? Give 3 Examples
A technique that suggests items to users based on the characteristics or attributes of items users have rated, analyzing the content of the items themselves.
List some popular similarity metrics
Consine Similarity, Pearson Correlation Coefficient, Jaccard Similarity, Euclidean Distance, etc.
What is Cosine Similarity?
Cosine similarity measures the cosine of the angle between two vectors. It ranges from -1 (completely dissimilar) to 1 (completely similar).
Why do you use Cosine Similarity?
Commonly used in text mining, collaborative filtering, and information retrieval.
Suitable for high-dimensional data where the direction of vectors matters more than the magnitude.
What is Euclidean similarity?
Euclidean distance measures the straight-line distance between two points in a multi-dimensional space. Smaller distances indicate greater similarity.
What is the usage of Euclidean similarity?
Suitable for continuous numerical data in a multidimensional space.
Often used in clustering, pattern recognition, and image analysis.
What is Hamming Distance?
the disimilarity between two strings of equal length. It is defined as the number of positions at which the corresponding symbols (characters or bits) are different.
What is the usage of Hamming Distance?
Primarily used for binary data, such as error detection and correction codes. Also applicable to categorical data with a predefined order.
What is Jaccard Similarity?
Jaccard similarity measures the proportion of common elements between two sets. It ranges from 0 (no common elements) to 1 (identical sets).
Why do you use Jaccard Similarity?
Commonly used in set similarity, document similarity, and recommendation systems. Suitable for scenarios where the presence or absence of elements is important.