W10 Web Search & Recommender System Flashcards

1
Q

what is the CPC model?

A

the cost per click model: advertisers pay the search enginge and get clicks in return (goal: induce transaction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is anchor text?

A

a descriptive text of the URL that the hyperlink points to

This information can be used to bridge the vocabulary gap between query and document: anchor texts may contain query terms that are not in the document itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

2 intuitions about hyperlinks

A
  1. the anchor text pointing to page B is a good description of page B (textual information)
  2. the hyperlink from A to B represents endorsement of page B, by the creator of page A (quality signal)
    both signals contain noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is PageRank?

A

technique for link analysis that assigns to every node in
the web graph a numerical score between 0 and 1

main intuition: pages visited more frequently in a random walk on the web are the more important pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 PageRank intuitions

A
  • incoming link counts are an important signal: a page is useful if it is cited often
  • indirect citations also count: if important pages are pointing to a page, the page must be important
  • smoothing: mooth citations with some random step to accommodate potential citations that have not yet been observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to estimate PageRank scores?

A
  1. start at a random page
  2. jump to another page: with probability alpha to a random page, with probability 1 - alpha to any outgoing link
  3. repeat step 2 until convergence of the scores
  4. final score is the probability that the surfer reaches the page
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why do we want query result diversification?

A

queries are often short and ambiguous: we don’t know what the user wants

if we take query-document similarity as the most important
ranking criterion, there might be a lot of redundance in the top-ranked results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is diversification?

A

do not consider the relevance of each document in isolation,
but consider how relevant the document is in light of:
1. the multiple possible information needs underlying the query
2. the other retrieved documents

goals: maximum coverage and minimum redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MMR

A

Maximal Marginal Relevance: score a relevant document as the document’s estimated relevance with respect to the query, discounted by the document’s maximum similarity with respect to the already selected documents in D_q

f_MMR(q,d,D_q) = lambda*f1(q,d) - (1 - lambda) * max f2(d,d_j)

f1(q,d) = relevance of d to q
f2(d,d_j) = similarity of d_j to d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the purpose of recommendation?

A

economic: the more relevant the suggestion, the more consumption
user: prevent overchoice / information overload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the recommendation task?

A
  • create a ranked personalized list of items, taking the context, situation, and information need into consideration
  • if the user interacts with a recommended item, the system was successful
    often based on previous interaction between users and items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5 goals of recommender systems

A

*relevance: users are more likely to consume items they find interesting
*novelty: the recommended item is something that the user has not seen in the past
*serendipity: the items recommended are somewhat unexpected/surprising (‘lucky discovery’)
*diversity: when the recommended list contains items of different types, it is more likely that the user likes one of these items
*explainability: does the user understand the recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

relevance signals: how to know what to recommend?

A

explicit feedback: like, buy, positive review
implicit feedback: browse, click, watch/listen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 types of recommender systems models?

A

collaborative filtering: use user-item interactions (ratings or buying behaviour in ratings matrix)

content-based recommender systems: use attribute information about users and the items

knowledge-based recommender systems: recommendations based on explicitly specified user requirements (explicit profiles, demographics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

content-based methods

A

do no use data from other users, but item descriptions combined with user’s ratings

user and item embeddings in the same space

use the user profile as a query to retrieve the most relevant items

useful for new items, but not for new users => knowledge-based models or recommend the most popular items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

evaluation of recommender systems

A
  • offline evaluation with benchmarks:
  • RMSE for estimated rankings
  • rank correlation between RecSyst and ground truth
  • ranking metrics for relevance
  • user studies
  • online evaluation (A/B testing)
17
Q

disadvantages of offline evaluation

A

they do not measure the actual user response (the data and the users might evolve over time)

prediction accuracy does not capture important characteristics of recommendations, such as serendipity and novelty

18
Q

disadvantages of offline evaluation

A

they do not measure the actual user response (the data and the users might evolve over time)

prediction accuracy does not capture important characteristics of recommendations, such as serendipity and novelty

19
Q

filter bubbles / rabbit holes

A

if users get recommended information that is most attractive to them they may get more and more of the same
- more problematic in new recommender systems and social media than in entertainment systems
- important to aim for diversity in recommendations