Exam Flashcards
What is the bottleneck of user-based CF and how does item-based Cf avoid it
the search for neighbours (in real-time) among a large user population of potential neighbours.
item-based CF avoids this by computing similarities between items instead of users
What is the intuition of item-based CF?
users are interested in items similar to those previously experienced
What is the edge that item-item similarities have over user-based and why?
They are more “stable” as the domain of items changes less than users, allowing for less frequent system updates.
What is the benefit of adjusted cosine similarity in item-based CF?
It accounts for differences in how users rate items
What is the underlying heuristic of CF?
people who agreed or disagreed on items in the past are likely to agree or disagree on future items
What are the steps in the UBCF algorithm
- Data representation
- similarity computation
- neighbourhood formation
- prediction/top-N list
What is the main issue with the MSD similarity metric?
it assumes that users rate according to similar distribution
For MSD similarity, what are two important features of the metric
- summations over co-rated items only -> else set to 0
- results in a value [0,1]
For Pearson similarity, what are two important features of the metric
- summations over co-rated items only -> else set to 0
- results in a value [-1,1]
What is the benefit of significance weighting to Pearson
It adjusts for the number of co-rated items
What impacts the range of cosine similarity results
the non-negativity of ratings
Briefly describe some of the extensions to Pearson Correlation
- jaccard index: modify similarity weights by the number of co-rated items between users divided by the union of items
- default voting: calculate over the union of items applying a default to non-co-rated items
- case amplification: emphasise weights which are close to 1 and reduce the influence of lower weights
- inverse user frequency (IUF): gives more weight to ratings for niche items
CF advantages
- quality and taste
- item descriptions/features
- serendipitous recommendations
CF Limitations
- cold start problem
- early rater problem
- sparsity problem
- scalability
What do RS help drive?
demand down the long-tail; benefits to both consumers and retailers alike
What does CF automate?
The “word-of-mouth” process
What is the key difference between CF and Content-based recommendation?
The use of the item’s descriptions/features (content)
How is document-document similarity calculated in Content-based?
The cosine of the angle between the document’s vectors
What is case-based recommendation?
A form of content-based recommendation which represents items using a well-defined set of features and feature values
List sources of recommendation knowledge and give examples of each
- transactional and behavioural data: clicks, purchases, likes.
- content and meta data: text, features, tags.
- experiential data: user-generated opinions.
List some properties of consumer reviews
- ubiquitous
- abundant
- usually independent
- often insightful
Do reviews matter?
Yes. Research shows that reviews help users to make better decisions. They increase conversion rates and improve satisfaction.
What are some considerations when making recommendations and ranking them
- business imperatives: e.g. promoting items
- domain
- the influence of particular items
How are non-personalised recommendations usually presented?
in the form of a top-N ranked list
Personalised recommendation considerations
- acquiring users’ personal information
- recommendation output
- personalisation: ephemeral or persistent
what is ephemeral personalisation?
matching current activity
what is persistent personalisation?
matching long-term interests
Benefits of RSs?
- turning web browsers into buyers
- cross/up-selling
- customer loyalty
What are the two main approaches to content-based recommendation and how do you distinguish between them?
- traditional content-based (unstructured)
- case-based (structured)
List term-weighting approaches
- term frequency
- normalised term frequency
- inverse document frequency (IDF)
- binary weighting
- NTF + IDF
What is term stemming?
considering terms of similar meaning as being the same for matching purposes
How are stop words handled?
They are omitted from the term-document matrix
Differences in making recommendations for NP vs P
- NP: rank recommendation candidates by similarity to the target item
- P: rank recommendation candidates by similarity to the target user’s profile
Reasons why case-based recommendation is a powerful approach to recommendation?
- facilitates the search and navigation of complex information spaces
- flexible user feedback options
- suitable for e-commerce applications
Underlying assumptions of case-based reasoning
- the world is a regular place and similar problems tend to have similar solutions
- the world is a repetitive place and similar problems tend to recur
CBR Cycle
- Retrieve
- Reuse
- Revise
- Retain
Key differences between Case-based from content-based systems
- case representation
- similarity assessment
In case-based recommenders, what do the following symbols represent: Sim(T, C), w, v, Sim(v1, v2)
- similarity between the target case and candidate case
- relative importance of a feature
- $v_{c,i}$ is the value of feature i in case C
- feature-level similarity
What is a key issue for case-based recommenders
acquiring similarity knowledge (numerical vs non-numerical, symmetric vs asymmetric)
The ideal balance of similarity vs density in similarity-based recommendation
we want the top-k retrieved items to be equally similar to the target item/user profile but in different ways
Two algorithms used for balancing similarity vs density in similarity-based recommendation
- Bounded greedy selection
- Shimazu’s algorithm
Advantages of content-based systems
early recs can be made
Issues of content-based systems
- feature identification and extraction can be problematic
- content-based filters cannot distinguish between low and high-quality items
- a “more-like-this” approach -> low serendipity
Evaluation methods for systems
- live-user trials
- offline evaluations