RL + Rec Systems Flashcards

Question 1

Q

Explain what RL is, and what we want to learn from it

Answer

A

Learning from interaction with an environment to achieve some long-term goal that is related to the state of the environment

we want to learn how to act to accomplish goals
given an environment that contains rewards, we want to learn a policy for acting

Question 2

Q

Define a simple RL Setup And Goal

Answer

A

Setup: We have an agent which is interacting with an environment which it can affect through actions. The agent may be able to sense the environment partially or fully.
Goal: the agent tries to maximise the long term reward conveyed using a reward signal

Question 3

Q

Explain the differences between Supervised and Reinforcement Learning

Answer

A

In SL, there’s an external “supervisor”, which has knowledge of the environment and who shares it with the agent to complete the task
Both strategies use mappings between inputs and outputs, but in RL there is a reward function which acts as a feedback to the agent
Supervised learning relies on labelled training data

Question 4

Q

Explain the differences between Unsupervised and Reinforcement Learning

Answer

A

In UL, there is no feedback from the environment
In UL, task is to find the underlying patterns rather than the mapping from input to output

Question 5

Q

What is a policy

Answer

A

It states what action the agent takes when in a particular state. Thus, it is a function that maps states to actions

Question 6

Q

Characteristics of RL

Answer

A

no supervisor, only a reward signal
feedback is delayed, not instantaneous
time really matters
Agent’s actions have immediate consequences

Question 7

Q

What is the difference between Fully and Partially Observable environments

Answer

A

With full observability, the agent directly observes environment state.
With partial observability, the agent indirectly observes the environment

Question 8

Q

What does expectimax search compute

Answer

A

the average score under optimal play (e.g. Stockfish)

Question 9

Q

Discuss the differences between model-based and model-free RL techniques

Answer

A

Learning: MB learns an internal model of the environment, MF learns a policy directly from experiences
MB uses the learned model for simulating and planning, MF adjusts policy based on observed rewards without an explicit model

Question 10

Q

Pros and cons of Model-based RL

Answer

A

Pros: sample-efficient, effective in complex dynamics
Cons: model learning can be challenging

Question 11

Q

Pros and cons of Model-Free RL

Answer

A

Pros: Flexible, applicable to a wide range of problems.
Cons: May require more interactions (computation), exploration challenges in complex environments.

Question 12

Q

What is collaborative filtering

Answer

A

Approach for making predictions about the preferences of a user by collecting information from many other users

Question 13

Q

Briefly describe content-based recommenders

Answer

A

They analyse item descriptions and metadata to identify items likely to be of interest to the target user

Question 14

Q

What are the differences between user-based and item-based collaborative filtering

Answer

A

In user-based CF, personal tastes are correlated, so the goal is to find users who share the same tastes as the target user, and use their information to make predictions.
In item-based CF, previous items of the target user are matched to similar items, predictions are made by combining those similar items

Question 15

Q

What are the differences between the Jaccard Index and Cosine similarity

Answer

A

JI is suitable for binary data, cosine similarity is used with real-valued data.

Question 16

Q

Give the steps of predicting ratings in User-Based CF

Answer

Study These Flashcards

A

Measure the similarity between the target user and all other users
Rank the users based on similarity
Aggregate the similar profiles in some way to get a predicting rating for the item of interest

Question 17

Q

Give the steps of making recommendations in User-Based CF

Answer

Study These Flashcards

A

Measure the similarity between the target user and all other users
Rank the users based on similarity
Aggregate the similar profiles to get the top recommended items

Question 18

Q

Explicit v Implicit Data Collection

Answer

Study These Flashcards

A

EDC actively asks users for explicit ratings for items
IDC gathers data directly based on user’s activity

Question 19

Q

What is offline evaluation and its advantages

Answer

Study These Flashcards

A

A previously collected dataset is used, no actual users are involved in the evaluation.

quick, cheap, easily repeatable

Question 20

Q

What is online evaluation and its advantages

Answer

Study These Flashcards

A

Users interact with a running system in a “live experiment”, and receive actual recommendations.
Feedback from the users is collected by observing their online behaviour and/or explicitly collecting their feedback

measures true customer satisfaction

Question 21

Q

Briefly discuss serendipity and diversity in Recommendation systems evaluation

Answer

Study These Flashcards

A

It is often not helpful to recommend obvious items that are too similar to one another.
An alternative evaluation goal is to examine the extent to which a recommender can generate diverse recommendations among its top results

RL + Rec Systems Flashcards

(21 cards)