RL + Rec Systems Flashcards
Explain what RL is, and what we want to learn from it
Learning from interaction with an environment to achieve some long-term goal that is related to the state of the environment
- we want to learn how to act to accomplish goals
- given an environment that contains rewards, we want to learn a policy for acting
Define a simple RL Setup And Goal
Setup: We have an agent which is interacting with an environment which it can affect through actions. The agent may be able to sense the environment partially or fully.
Goal: the agent tries to maximise the long term reward conveyed using a reward signal
Explain the differences between Supervised and Reinforcement Learning
- In SL, there’s an external “supervisor”, which has knowledge of the environment and who shares it with the agent to complete the task
- Both strategies use mappings between inputs and outputs, but in RL there is a reward function which acts as a feedback to the agent
- Supervised learning relies on labelled training data
Explain the differences between Unsupervised and Reinforcement Learning
- In UL, there is no feedback from the environment
- In UL, task is to find the underlying patterns rather than the mapping from input to output
What is a policy
It states what action the agent takes when in a particular state. Thus, it is a function that maps states to actions
Characteristics of RL
- no supervisor, only a reward signal
- feedback is delayed, not instantaneous
- time really matters
- Agent’s actions have immediate consequences
What is the difference between Fully and Partially Observable environments
With full observability, the agent directly observes environment state.
With partial observability, the agent indirectly observes the environment
What does expectimax search compute
the average score under optimal play (e.g. Stockfish)
Discuss the differences between model-based and model-free RL techniques
- Learning: MB learns an internal model of the environment, MF learns a policy directly from experiences
- MB uses the learned model for simulating and planning, MF adjusts policy based on observed rewards without an explicit model
Pros and cons of Model-based RL
- Pros: sample-efficient, effective in complex dynamics
- Cons: model learning can be challenging
Pros and cons of Model-Free RL
- Pros: Flexible, applicable to a wide range of problems.
- Cons: May require more interactions (computation), exploration challenges in complex environments.
What is collaborative filtering
Approach for making predictions about the preferences of a user by collecting information from many other users
Briefly describe content-based recommenders
They analyse item descriptions and metadata to identify items likely to be of interest to the target user
What are the differences between user-based and item-based collaborative filtering
- In user-based CF, personal tastes are correlated, so the goal is to find users who share the same tastes as the target user, and use their information to make predictions.
- In item-based CF, previous items of the target user are matched to similar items, predictions are made by combining those similar items
What are the differences between the Jaccard Index and Cosine similarity
JI is suitable for binary data, cosine similarity is used with real-valued data.
Give the steps of predicting ratings in User-Based CF
- Measure the similarity between the target user and all other users
- Rank the users based on similarity
- Aggregate the similar profiles in some way to get a predicting rating for the item of interest
Give the steps of making recommendations in User-Based CF
- Measure the similarity between the target user and all other users
- Rank the users based on similarity
- Aggregate the similar profiles to get the top recommended items
Explicit v Implicit Data Collection
EDC actively asks users for explicit ratings for items
IDC gathers data directly based on user’s activity
What is offline evaluation and its advantages
A previously collected dataset is used, no actual users are involved in the evaluation.
- quick, cheap, easily repeatable
What is online evaluation and its advantages
Users interact with a running system in a “live experiment”, and receive actual recommendations.
Feedback from the users is collected by observing their online behaviour and/or explicitly collecting their feedback
- measures true customer satisfaction
Briefly discuss serendipity and diversity in Recommendation systems evaluation
It is often not helpful to recommend obvious items that are too similar to one another.
An alternative evaluation goal is to examine the extent to which a recommender can generate diverse recommendations among its top results