8. Recommender Systems Flashcards
What is the utility function for the formal model?
u: X x S -> R
Where X is a set of customers, S is a set of items, and R is a rating.
Essentially, we get a rating for each customer/item pairing
What are the key problems associated with the formal model for recommender systems?
- Gathering known ratings to fill the utility matrix
- Extrapolating unknown ratings from known ones
- Evaluating extrapolation methods in terms of success or performance
What are the 2 ways we can collect ratings for the utility matrix?
- Explicitly asking people to rate items which doesn’t work well in practice
- Implicitly learning ratings from user actions
What are the approaches to recommender systems for extrapolating utilities?
- Content-based
- Collaborative
- Latent factor based
Why is extrapolating utilities a problem?
Most people have not rated most items
New items have no ratings
New users have no history
Not much info to extrapolate from
What is the main idea behind a content-based recommendation system?
To recommend items to customer x that are similar to previous items rated highly by that customer
What is an item profile?
A set of features. It is convenient to think of it has a vector with one dimension per feature
What is the prediction heuristic for content-based recommendation systems?
Given a user profile x and item profile i, estimate u(x, i) using cosine similarity between x and i
What is a user profile and how can we calculate it?
When a user has rated items each with their own profile, we create a user profile using the weighted average of rated item profiles or we can weight them by the difference from the average rating for that item
What are the pros of the content-based recommendation system?
- No need for data on other users
- Able to recommend to users with unique tastes
- Able to recommend new and unpopular items
- Able to provide explanations of recommendations by listing the content features that caused it to be selected
What are the cons of the content-based recommendation system?
- Finding the appropriate features is hard
- Recommendations for new users is difficult
- May be overspecialization where it never recommends items outside of the content profile
- Unable to exploit quality judgements from other users
What is the goal of a collaborative filtering system?
Finding a set N of other users whose ratings are similar to user x’s ratings. We estimate x’s ratings based on ratings of users in N
What is the formula for Jaccard Similarity when we have sets of ratings for users A and B?
Sim(A, B) = |rA INTERSECT rB|/|rA U rB|
What is the formula for cosine similarity when we have sets of ratings for users A and B?
Sim(A, B) = cos(rA, rB) = (rA * rB)/(|rA||rB|)
What is centered cosine similarity (Pearson Correlation)?
Same as cosine similarity but we first normalize all ratings by subtracting the mean of the row (mean of a user’s ratings)
What is the issue with the Jaccard and cosine similarity measures?
Jaccard ignores the value of the rating and only looks at overlapping things
Cosine treats missing ratings as negative by giving them a 0
How do we translate a similarity metric to a recommendation?
rxi = (sim(x, y) * ryi for all y in N)/(sim(x, y) for all y in N)
Where N is the set of k users most similar to x who have rated i, rx is the vector of user x’s ratings
What is item-item collaborative filtering?
Unlike user-user filtering where we compare user preferences, we want to find similar items to a given item
What is the process for item-item collaborative filtering?
- For item i, find other similar items
- Estimate rating for item i based on ratings for similar items
- We can use the same similarity metrics and prediction functions as the user-user model
What is the upside of the collaborative filtering system?
It works for any kind of item, no feature selection is needed
What are the cons of the collaborative filtering system?
- Cold start problem where there are not enough users in the system to find a match
- Sparisty of the user/ratings matrix means it is hard to find users that have rated the same items
3.First rater problem where we cannot recommend an item that hasn’t been rated before - Popularity bias meaning we cannot recommend items to someone with a unique taste
How do we compute a global baseline estimate and why would we need one?
Average rating + (movie rating - average) + (user rating - average)
We need this in case someone has not rated any movie similar to one we are trying to estimate a rating for