C10: RecSys Flashcards

Question 1

Q

what is the CPC model?

Answer

A

the cost per click model: advertisers pay the search enginge and get clicks in return (goal: induce transaction)

Question 2

Q

what is anchor text?

Answer

A

a descriptive text of the URL that the hyperlink points to

This information can be used to bridge the vocabulary gap between query and document: anchor texts may contain query terms that are not in the document itself

Question 3

Q

2 intuitions about hyperlinks

Answer

A

the anchor text pointing to page B is a good description of page B (textual information)
the hyperlink from A to B represents endorsement of page B, by the creator of page A (quality signal)

both signals contain noise

Question 4

Q

what is PageRank?

Answer

A

technique for link analysis that assigns to every node in
the web graph a numerical score between 0 and 1

main intuition: pages visited more frequently in a random walk on the web are the more important pages

Question 5

Q

3 PageRank intuitions

Answer

A

incoming link counts are an important signal: a page is useful if it is cited often
indirect citations also count: if important pages are pointing to a page, the page must be important
smoothing: mooth citations with some random step to accommodate potential citations that have not yet been observed

Question 6

Q

how to estimate PageRank scores?

Answer

A

start at a random page
jump to another page: with probability alpha to a random page, with probability 1 - alpha to any outgoing link
repeat step 2 until convergence of the scores
final score is the probability that the surfer reaches the page

Question 7

Q

why do we want query result diversification?

Answer

A

queries are often short and ambiguous: we don’t know what the user wants
if we take query-document similarity as the most important ranking criterion, there might be a lot of redundance in the top-ranked results

Question 8

Q

what is diversification?

Answer

A

do not consider the relevance of each document in isolation, but consider how relevant the document is in light of:
1. the multiple possible information needs underlying the query
2. the other retrieved documents

goals: maximum coverage and minimum redundancy

Question 9

Q

MMR

Answer

A

Maximal Marginal Relevance: score a relevant document as the document’s estimated relevance with respect to the query, discounted by the document’s maximum similarity with respect to the already selected documents in D_q

f_MMR(q,d,D_q) = lambda*f1(q,d) - (1 - lambda) * max f2(d,d_j)

f1(q,d) = relevance of d to q
f2(d,d_j) = similarity of d_j to d

Question 10

Q

what is the purpose of recommendation?

Answer

A

economic: the more relevant the suggestion, the more consumption
user: prevent overchoice / information overload

Question 11

Q

what is the recommendation task?

Answer

A

create a ranked personalized list of items, taking the context, situation, and information need into consideration
if the user interacts with a recommended item, the system was successful

often based on previous interaction between users and items

Question 12

Q

5 goals of recommender systems

Answer

A

relevance: users are more likely to consume items they find interesting
novelty: the recommended item is something that the user has not seen in the past
serendipity: the items recommended are somewhat unexpected/surprising (‘lucky discovery’)
diversity: when the recommended list contains items of different types, it is more likely that the user likes one of these items
explainability: does the user understand the recommendations

Question 13

Q

relevance signals: how to know what to recommend? (2 types of feedback)

Answer

A

explicit feedback: like, buy, positive review
implicit feedback: browse, click, watch/listen

Question 14

Q

what are the 3 types of recommender systems models?

Answer

A

collaborative filtering: use user-item interactions (ratings or buying behaviour in ratings matrix)
content-based recommender systems: use attribute information about users and the items
knowledge-based recommender systems: recommendations based on explicitly specified user requirements (explicit profiles, demographics)

Question 15

Q

content-based methods

Answer

A

do no use data from other users, but item descriptions combined with user’s ratings
user and item embeddings in the same space
use the user profile as a query to retrieve the most relevant items

useful for new items, but not for new users => knowledge-based models or recommend the most popular items

Question 16

Q

evaluation of recommender systems

Answer

A

offline evaluation with benchmarks:
- RMSE for estimated rankings
- rank correlation between RecSyst and ground truth
- ranking metrics for relevance
user studies
online evaluation (A/B testing)

Question 17

Q

disadvantages of offline evaluation

Answer

A

they do not measure the actual user response (the data and the users might evolve over time)
prediction accuracy does not capture important characteristics of recommendations, such as serendipity and novelty

Question 18

Q

filter bubbles / rabbit holes

Answer

A

if users get recommended information that is most attractive to them they may get more and more of the same
- more problematic in new recommender systems and social media than in entertainment systems
- important to aim for diversity in recommendations