RL + Rec Systems Flashcards

1
Q

Explain what RL is, and what we want to learn from it

A

Learning from interaction with an environment to achieve some long-term goal that is related to the state of the environment

  • we want to learn how to act to accomplish goals
  • given an environment that contains rewards, we want to learn a policy for acting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define a simple RL Setup And Goal

A

Setup: We have an agent which is interacting with an environment which it can affect through actions. The agent may be able to sense the environment partially or fully.
Goal: the agent tries to maximise the long term reward conveyed using a reward signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the differences between Supervised and Reinforcement Learning

A
  • In SL, there’s an external “supervisor”, which has knowledge of the environment and who shares it with the agent to complete the task
  • Both strategies use mappings between inputs and outputs, but in RL there is a reward function which acts as a feedback to the agent
  • Supervised learning relies on labelled training data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the differences between Unsupervised and Reinforcement Learning

A
  • In UL, there is no feedback from the environment
  • In UL, task is to find the underlying patterns rather than the mapping from input to output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a policy

A

It states what action the agent takes when in a particular state. Thus, it is a function that maps states to actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Characteristics of RL

A
  • no supervisor, only a reward signal
  • feedback is delayed, not instantaneous
  • time really matters
  • Agent’s actions have immediate consequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between Fully and Partially Observable environments

A

With full observability, the agent directly observes environment state.
With partial observability, the agent indirectly observes the environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does expectimax search compute

A

the average score under optimal play (e.g. Stockfish)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discuss the differences between model-based and model-free RL techniques

A
  1. Learning: MB learns an internal model of the environment, MF learns a policy directly from experiences
  2. MB uses the learned model for simulating and planning, MF adjusts policy based on observed rewards without an explicit model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pros and cons of Model-based RL

A
  • Pros: sample-efficient, effective in complex dynamics
  • Cons: model learning can be challenging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pros and cons of Model-Free RL

A
  • Pros: Flexible, applicable to a wide range of problems.
  • Cons: May require more interactions (computation), exploration challenges in complex environments.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is collaborative filtering

A

Approach for making predictions about the preferences of a user by collecting information from many other users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Briefly describe content-based recommenders

A

They analyse item descriptions and metadata to identify items likely to be of interest to the target user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the differences between user-based and item-based collaborative filtering

A
  • In user-based CF, personal tastes are correlated, so the goal is to find users who share the same tastes as the target user, and use their information to make predictions.
  • In item-based CF, previous items of the target user are matched to similar items, predictions are made by combining those similar items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the differences between the Jaccard Index and Cosine similarity

A

JI is suitable for binary data, cosine similarity is used with real-valued data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give the steps of predicting ratings in User-Based CF

A
  1. Measure the similarity between the target user and all other users
  2. Rank the users based on similarity
  3. Aggregate the similar profiles in some way to get a predicting rating for the item of interest
17
Q

Give the steps of making recommendations in User-Based CF

A
  1. Measure the similarity between the target user and all other users
  2. Rank the users based on similarity
  3. Aggregate the similar profiles to get the top recommended items
18
Q

Explicit v Implicit Data Collection

A

EDC actively asks users for explicit ratings for items
IDC gathers data directly based on user’s activity

19
Q

What is offline evaluation and its advantages

A

A previously collected dataset is used, no actual users are involved in the evaluation.

  • quick, cheap, easily repeatable
20
Q

What is online evaluation and its advantages

A

Users interact with a running system in a “live experiment”, and receive actual recommendations.
Feedback from the users is collected by observing their online behaviour and/or explicitly collecting their feedback

  • measures true customer satisfaction
21
Q

Briefly discuss serendipity and diversity in Recommendation systems evaluation

A

It is often not helpful to recommend obvious items that are too similar to one another.
An alternative evaluation goal is to examine the extent to which a recommender can generate diverse recommendations among its top results