Recommender systems Flashcards

1
Q

What is a recommender system? What is their task?

A
  • software tools and techniques that provide suggestions for items that are most
    likely of interest to a particular user
    • or predict the preference a user would give
  • can be used for various task but the main is still predictions and suggestions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What types of data are used in RecSys?

A
  • input
    • users data
    • items data
    • interactions
  • RecSys elaborates them with context and outputs recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the taxonomy of RecSys?

A
  • personalized -> depends on datas and users interactions
    • collaborative filtering -> recommend items liked by users with similar tastes (Item-item similarity, User-user similarity)
  • non-personalized -> same items suggested to all users
    • most popular
    • highest rated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of interactions used in RecSys?

A
  • explicit feedback
    • hard to collect, requires user effort (likes, rating)
    • reliable
  • implicit feedback (visualizations, clicks)
    • easy to collect
    • noisy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the rating matrix?

A
  • way to represent rating informations
    • rows users
    • columns items
    • explicit -> rating or 0, implicit -> boolean
  • matrix is generally sparse density < 0.01%
  • user and item distribution is generally long tail (users generally interacts with little items)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some popular recommendation tasks?

A
  • Rating prediction
    • explicit feedback
    • predict missing ratings in the rating matrix
  • TOP-N item recommendation
    • implicit feedback
    • predicting N items the user will like the most
    • uses scoring function for relevance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the quality indicators for RecSys? Why are they used?

A
  • to tell if a system is doing a good job
    • Relevance -> ability to recommend items that users like
    • Coverage -> most of the items in a catalogue
    • Novelty -> items unknown to the user
    • Diversity -> diversify the recommended items
    • Serendipity -> ability of surprising the user (items that users would have never been able to discover by themselves)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can a recommender system be evaluated?

A
  • offline
    • does not require involvement of user
    • used for years
    • based on benchmark datasets (qualitative)
    • user experience not considered
  • online
    • users directly involved
    • evaluation qualitative and quantitative
    • user experience considered
    • users are not consistent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does online evaluation work?

A
  • direct user feedback (give a form for user to compile, feedback on recommendations)
  • A/B testing (two set of users, each given a version (base and new variation), evaluating improvement (metrics or feedback)
  • controlled (in lab) experiments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the cold-start problem refer to in RecSys?

A
  • users are unknown at testing time due to train/test splitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In what ways can the ratings dataset be partitioned for offline evaluation?

A
  • avoids cold-start problem, can work with users with low ratings
  • randomly selects test ratings, can have cold-start users
  • good practice to split training-test on the basis of the timestamp
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some evaluation metrics used for offline rating prediction?

A
  • explicit (rating prediction)
    • Mean Absolute Error
    • Mean Squared Error
    • Root Mean Squared Error
  • implicit (top-N)
    • Recall
    • Precision
    • Area Under Curve
    • Average Precision
    • Discounted Cumulative Gain
    • Mean Reciprocal Rank
  • Diversity
    • similarity measure
  • Novelty
    • aproximately the inverse of popularity of retrieved items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In non-personalized RS how is most popular computed?

A
  • number of ratings columns in matrix rating is calculated
  • the one with the highest number of ratings is selected
  • if user has already interacted with item the most popular after is presented
    • unless re-consumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In non-personalized RS how is highest rated computed?

A
  • average of columns in matrix rating is computed
  • the one with the highest value of rating is selected
  • if user has already interacted with item the highest rated after is presented
    • unless re-consumption
  • generally normalization factor added to give a bias towards popular items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What approaches exist to collaborative filtering?

A
  • similarity-wise
    • item-based, based on similarity between items (share many users)
    • user-based, based on similarity between users (share many items)
  • algorithm-wise
    • memory-based, compute the similarity between users or items
    • model-based, predict users’ rating of unrated items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some evaluation metrics used in CF?

A
  • implicit feedback
    • cosine similarity
  • explicit feedback
    • pearson correlation -> simili se rating si discosta dalla media
    • adjusted cosine similarity
      • differences in the rating scales, more appropriate to center ratings on user mean
    • shrinkage, re-weighting similarity penalizing ones on few ratings
17
Q

What are some memory based methods used in CF?

A
  • k-Nearest Neighbours
    • weighted combination of most similar users/items ratings
    • both implicit and explicit
18
Q

What are some model based methods used in CF?

A
  • matrix factorization
    • matrix learnt from data, from a representation of users and items
    • mapping of users and items in a joint latent factor space with dimensionality k
    • interactions are modeled as a scalar product between items and users
19
Q

How istraining for matrix factorization done?

A
  • Stochastic Gradient Descent
  • Alternate Least Square
    • easy to parallelize