Exam Flashcards
What is the bottleneck of user-based CF and how does item-based Cf avoid it
the search for neighbours (in real-time) among a large user population of potential neighbours.
item-based CF avoids this by computing similarities between items instead of users
What is the intuition of item-based CF?
users are interested in items similar to those previously experienced
What is the edge that item-item similarities have over user-based and why?
They are more “stable” as the domain of items changes less than users, allowing for less frequent system updates.
What is the benefit of adjusted cosine similarity in item-based CF?
It accounts for differences in how users rate items
What is the underlying heuristic of CF?
people who agreed or disagreed on items in the past are likely to agree or disagree on future items
What are the steps in the UBCF algorithm
- Data representation
- similarity computation
- neighbourhood formation
- prediction/top-N list
What is the main issue with the MSD similarity metric?
it assumes that users rate according to similar distribution
For MSD similarity, what are two important features of the metric
- summations over co-rated items only -> else set to 0
- results in a value [0,1]
For Pearson similarity, what are two important features of the metric
- summations over co-rated items only -> else set to 0
- results in a value [-1,1]
What is the benefit of significance weighting to Pearson
It adjusts for the number of co-rated items
What impacts the range of cosine similarity results
the non-negativity of ratings
Briefly describe some of the extensions to Pearson Correlation
- jaccard index: modify similarity weights by the number of co-rated items between users divided by the union of items
- default voting: calculate over the union of items applying a default to non-co-rated items
- case amplification: emphasise weights which are close to 1 and reduce the influence of lower weights
- inverse user frequency (IUF): gives more weight to ratings for niche items
CF advantages
- quality and taste
- item descriptions/features
- serendipitous recommendations
CF Limitations
- cold start problem
- early rater problem
- sparsity problem
- scalability
What do RS help drive?
demand down the long-tail; benefits to both consumers and retailers alike
What does CF automate?
The “word-of-mouth” process
What is the key difference between CF and Content-based recommendation?
The use of the item’s descriptions/features (content)