W10 Web Search & Recommender System Flashcards
what is the CPC model?
the cost per click model: advertisers pay the search enginge and get clicks in return (goal: induce transaction)
what is anchor text?
a descriptive text of the URL that the hyperlink points to
This information can be used to bridge the vocabulary gap between query and document: anchor texts may contain query terms that are not in the document itself
2 intuitions about hyperlinks
- the anchor text pointing to page B is a good description of page B (textual information)
- the hyperlink from A to B represents endorsement of page B, by the creator of page A (quality signal)
both signals contain noise
what is PageRank?
technique for link analysis that assigns to every node in
the web graph a numerical score between 0 and 1
main intuition: pages visited more frequently in a random walk on the web are the more important pages
3 PageRank intuitions
- incoming link counts are an important signal: a page is useful if it is cited often
- indirect citations also count: if important pages are pointing to a page, the page must be important
- smoothing: mooth citations with some random step to accommodate potential citations that have not yet been observed
how to estimate PageRank scores?
- start at a random page
- jump to another page: with probability alpha to a random page, with probability 1 - alpha to any outgoing link
- repeat step 2 until convergence of the scores
- final score is the probability that the surfer reaches the page
why do we want query result diversification?
queries are often short and ambiguous: we don’t know what the user wants
if we take query-document similarity as the most important
ranking criterion, there might be a lot of redundance in the top-ranked results
what is diversification?
do not consider the relevance of each document in isolation,
but consider how relevant the document is in light of:
1. the multiple possible information needs underlying the query
2. the other retrieved documents
goals: maximum coverage and minimum redundancy
MMR
Maximal Marginal Relevance: score a relevant document as the document’s estimated relevance with respect to the query, discounted by the document’s maximum similarity with respect to the already selected documents in D_q
f_MMR(q,d,D_q) = lambda*f1(q,d) - (1 - lambda) * max f2(d,d_j)
f1(q,d) = relevance of d to q
f2(d,d_j) = similarity of d_j to d
what is the purpose of recommendation?
economic: the more relevant the suggestion, the more consumption
user: prevent overchoice / information overload
what is the recommendation task?
- create a ranked personalized list of items, taking the context, situation, and information need into consideration
- if the user interacts with a recommended item, the system was successful
often based on previous interaction between users and items
5 goals of recommender systems
*relevance: users are more likely to consume items they find interesting
*novelty: the recommended item is something that the user has not seen in the past
*serendipity: the items recommended are somewhat unexpected/surprising (‘lucky discovery’)
*diversity: when the recommended list contains items of different types, it is more likely that the user likes one of these items
*explainability: does the user understand the recommendations
relevance signals: how to know what to recommend?
explicit feedback: like, buy, positive review
implicit feedback: browse, click, watch/listen
what are the 3 types of recommender systems models?
collaborative filtering: use user-item interactions (ratings or buying behaviour in ratings matrix)
content-based recommender systems: use attribute information about users and the items
knowledge-based recommender systems: recommendations based on explicitly specified user requirements (explicit profiles, demographics)
content-based methods
do no use data from other users, but item descriptions combined with user’s ratings
user and item embeddings in the same space
use the user profile as a query to retrieve the most relevant items
useful for new items, but not for new users => knowledge-based models or recommend the most popular items