C10: RecSys Flashcards

1
Q

what is the CPC model?

A

the cost per click model: advertisers pay the search enginge and get clicks in return (goal: induce transaction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is anchor text?

A

a descriptive text of the URL that the hyperlink points to

This information can be used to bridge the vocabulary gap between query and document: anchor texts may contain query terms that are not in the document itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

2 intuitions about hyperlinks

A
  1. the anchor text pointing to page B is a good description of page B (textual information)
  2. the hyperlink from A to B represents endorsement of page B, by the creator of page A (quality signal)

both signals contain noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is PageRank?

A

technique for link analysis that assigns to every node in
the web graph a numerical score between 0 and 1

main intuition: pages visited more frequently in a random walk on the web are the more important pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 PageRank intuitions

A
  • incoming link counts are an important signal: a page is useful if it is cited often
  • indirect citations also count: if important pages are pointing to a page, the page must be important
  • smoothing: mooth citations with some random step to accommodate potential citations that have not yet been observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to estimate PageRank scores?

A
  1. start at a random page
  2. jump to another page: with probability alpha to a random page, with probability 1 - alpha to any outgoing link
  3. repeat step 2 until convergence of the scores
  4. final score is the probability that the surfer reaches the page
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why do we want query result diversification?

A
  1. queries are often short and ambiguous: we don’t know what the user wants
  2. if we take query-document similarity as the most important ranking criterion, there might be a lot of redundance in the top-ranked results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is diversification?

A

do not consider the relevance of each document in isolation, but consider how relevant the document is in light of:
1. the multiple possible information needs underlying the query
2. the other retrieved documents

goals: maximum coverage and minimum redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MMR

A

Maximal Marginal Relevance: score a relevant document as the document’s estimated relevance with respect to the query, discounted by the document’s maximum similarity with respect to the already selected documents in D_q

f_MMR(q,d,D_q) = lambda*f1(q,d) - (1 - lambda) * max f2(d,d_j)

f1(q,d) = relevance of d to q
f2(d,d_j) = similarity of d_j to d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the purpose of recommendation?

A

economic: the more relevant the suggestion, the more consumption
user: prevent overchoice / information overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the recommendation task?

A
  • create a ranked personalized list of items, taking the context, situation, and information need into consideration
  • if the user interacts with a recommended item, the system was successful

often based on previous interaction between users and items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5 goals of recommender systems

A
  • relevance: users are more likely to consume items they find interesting
  • novelty: the recommended item is something that the user has not seen in the past
  • serendipity: the items recommended are somewhat unexpected/surprising (‘lucky discovery’)
  • diversity: when the recommended list contains items of different types, it is more likely that the user likes one of these items
  • explainability: does the user understand the recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

relevance signals: how to know what to recommend? (2 types of feedback)

A

explicit feedback: like, buy, positive review
implicit feedback: browse, click, watch/listen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 types of recommender systems models?

A
  • collaborative filtering: use user-item interactions (ratings or buying behaviour in ratings matrix)
  • content-based recommender systems: use attribute information about users and the items
  • knowledge-based recommender systems: recommendations based on explicitly specified user requirements (explicit profiles, demographics)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

content-based methods

A
  • do no use data from other users, but item descriptions combined with user’s ratings
  • user and item embeddings in the same space
  • use the user profile as a query to retrieve the most relevant items

useful for new items, but not for new users => knowledge-based models or recommend the most popular items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

evaluation of recommender systems

A
  1. offline evaluation with benchmarks:
    - RMSE for estimated rankings
    - rank correlation between RecSyst and ground truth
    - ranking metrics for relevance
  2. user studies
  3. online evaluation (A/B testing)
17
Q

disadvantages of offline evaluation

A
  • they do not measure the actual user response (the data and the users might evolve over time)
  • prediction accuracy does not capture important characteristics of recommendations, such as serendipity and novelty
18
Q

filter bubbles / rabbit holes

A

if users get recommended information that is most attractive to them they may get more and more of the same
- more problematic in new recommender systems and social media than in entertainment systems
- important to aim for diversity in recommendations