Exam Flashcards

1
Q

What is the bottleneck of user-based CF and how does item-based Cf avoid it

A

the search for neighbours (in real-time) among a large user population of potential neighbours.
item-based CF avoids this by computing similarities between items instead of users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the intuition of item-based CF?

A

users are interested in items similar to those previously experienced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the edge that item-item similarities have over user-based and why?

A

They are more “stable” as the domain of items changes less than users, allowing for less frequent system updates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the benefit of adjusted cosine similarity in item-based CF?

A

It accounts for differences in how users rate items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the underlying heuristic of CF?

A

people who agreed or disagreed on items in the past are likely to agree or disagree on future items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps in the UBCF algorithm

A
  1. Data representation
  2. similarity computation
  3. neighbourhood formation
  4. prediction/top-N list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main issue with the MSD similarity metric?

A

it assumes that users rate according to similar distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For MSD similarity, what are two important features of the metric

A
  • summations over co-rated items only -> else set to 0
  • results in a value [0,1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For Pearson similarity, what are two important features of the metric

A
  • summations over co-rated items only -> else set to 0
  • results in a value [-1,1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the benefit of significance weighting to Pearson

A

It adjusts for the number of co-rated items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What impacts the range of cosine similarity results

A

the non-negativity of ratings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Briefly describe some of the extensions to Pearson Correlation

A
  • jaccard index: modify similarity weights by the number of co-rated items between users divided by the union of items
  • default voting: calculate over the union of items applying a default to non-co-rated items
  • case amplification: emphasise weights which are close to 1 and reduce the influence of lower weights
  • inverse user frequency (IUF): gives more weight to ratings for niche items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CF advantages

A
  • quality and taste
  • item descriptions/features
  • serendipitous recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CF Limitations

A
  • cold start problem
  • early rater problem
  • sparsity problem
  • scalability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do RS help drive?

A

demand down the long-tail; benefits to both consumers and retailers alike

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does CF automate?

A

The “word-of-mouth” process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the key difference between CF and Content-based recommendation?

A

The use of the item’s descriptions/features (content)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is document-document similarity calculated in Content-based?

A

The cosine of the angle between the document’s vectors

19
Q

What is case-based recommendation?

A

A form of content-based recommendation which represents items using a well-defined set of features and feature values

20
Q

List sources of recommendation knowledge and give examples of each

A
  • transactional and behavioural data: clicks, purchases, likes.
  • content and meta data: text, features, tags.
  • experiential data: user-generated opinions.
21
Q

List some properties of consumer reviews

A
  • ubiquitous
  • abundant
  • usually independent
  • often insightful
22
Q

Do reviews matter?

A

Yes. Research shows that reviews help users to make better decisions. They increase conversion rates and improve satisfaction.

23
Q

What are some considerations when making recommendations and ranking them

A
  • business imperatives: e.g. promoting items
  • domain
  • the influence of particular items
24
Q

How are non-personalised recommendations usually presented?

A

in the form of a top-N ranked list

25
Q

Personalised recommendation considerations

A
  • acquiring users’ personal information
  • recommendation output
  • personalisation: ephemeral or persistent
26
Q

what is ephemeral personalisation?

A

matching current activity

27
Q

what is persistent personalisation?

A

matching long-term interests

28
Q

Benefits of RSs?

A
  • turning web browsers into buyers
  • cross/up-selling
  • customer loyalty
29
Q

What are the two main approaches to content-based recommendation and how do you distinguish between them?

A
  • traditional content-based (unstructured)
  • case-based (structured)
30
Q

List term-weighting approaches

A
  • term frequency
  • normalised term frequency
  • inverse document frequency (IDF)
  • binary weighting
  • NTF + IDF
31
Q

What is term stemming?

A

considering terms of similar meaning as being the same for matching purposes

32
Q

How are stop words handled?

A

They are omitted from the term-document matrix

33
Q

Differences in making recommendations for NP vs P

A
  • NP: rank recommendation candidates by similarity to the target item
  • P: rank recommendation candidates by similarity to the target user’s profile
34
Q

Reasons why case-based recommendation is a powerful approach to recommendation?

A
  • facilitates the search and navigation of complex information spaces
  • flexible user feedback options
  • suitable for e-commerce applications
35
Q

Underlying assumptions of case-based reasoning

A
  • the world is a regular place and similar problems tend to have similar solutions
  • the world is a repetitive place and similar problems tend to recur
36
Q

CBR Cycle

A
  • Retrieve
  • Reuse
  • Revise
  • Retain
37
Q

Key differences between Case-based from content-based systems

A
  • case representation
  • similarity assessment
38
Q

In case-based recommenders, what do the following symbols represent: Sim(T, C), w, v, Sim(v1, v2)

A
  • similarity between the target case and candidate case
  • relative importance of a feature
  • $v_{c,i}$ is the value of feature i in case C
  • feature-level similarity
39
Q

What is a key issue for case-based recommenders

A

acquiring similarity knowledge (numerical vs non-numerical, symmetric vs asymmetric)

40
Q

The ideal balance of similarity vs density in similarity-based recommendation

A

we want the top-k retrieved items to be equally similar to the target item/user profile but in different ways

41
Q

Two algorithms used for balancing similarity vs density in similarity-based recommendation

A
  • Bounded greedy selection
  • Shimazu’s algorithm
42
Q

Advantages of content-based systems

A

early recs can be made

43
Q

Issues of content-based systems

A
  • feature identification and extraction can be problematic
  • content-based filters cannot distinguish between low and high-quality items
  • a “more-like-this” approach -> low serendipity
44
Q

Evaluation methods for systems

A
  • live-user trials
  • offline evaluations