7: Recommender Systems Flashcards
What is a recommender system ?
A system which attempts to recommend information items likely to be of interest to the users.
Eg movies, TV, music, books, news, jokes, by Netflix or Amazon.
What are the two ways to collect data from users ?
Explicit:
- Ask a user to rate an item
- Ask a user to create a (wish) list
Implicit:
- Keep a record of the items a user searches/views/purchases
- Analyse the user’s social network to discover likes and dislikes
What other data can be used ?
- Domain-based knowledge (category)
- Patterns and relations
- Higher level context
Types recommendation algorithms ?
- Popularity-based
- Demographic based
- Content based
What is item co-occurrence ?
‘People who like this also like …’
- For each pair of items, count how often they are bought together (slow, but offline)
- For each item that is being purchased, rank the other items by their co-occurrence counts.
- Recommend the items from the top pairs that were bought together in the past.
Content-based recommendation ?
Offers similar products with similar features, eg another computer with higher RAM and it costs about the same.
- Hard to guess which features are important for particular user
- Hard to compare different product features in different product domains
Collaborative Filtering ?
- Used for spam control
Eg is an email address is blacklisted by one user, future messages can be blocked for all users.
Give 2 types of CF ?
- User-based CF: find the most similar users and calculate their average rating for the new item
- Item-based CF: find the most similar items and calculate their current user’s average ratings.
What are the advantages and disadvantages with CF ?
- Pros: simplicity, bc no content analysis is necessary
- Cons:
- Sparsity (not enough data)
- Cold-start (no past similar purchases)
- Users may get recommendations for low quality products bought by friends
What factors should be taken into account for designing recommendation systems ?
- Explanations
- Minimum nonsense recommendations
- Privacy
- Spam recommender
- No vandalism/offensive/explicit content
What data to collect and how ?
- Find ways to collect as much input as possible without being disruptive
- A lot of data that can train a system: votes, clicks, page-view time, purchases, tagging, adding a title.
What’s more important between:
- Data collection algorithm
- Data collected
Data collected