RL + Rec Systems Flashcards
Explain what RL is, and what we want to learn from it
Learning from interaction with an environment to achieve some long-term goal that is related to the state of the environment
- we want to learn how to act to accomplish goals
- given an environment that contains rewards, we want to learn a policy for acting
Define a simple RL Setup And Goal
Setup: We have an agent which is interacting with an environment which it can affect through actions. The agent may be able to sense the environment partially or fully.
Goal: the agent tries to maximise the long term reward conveyed using a reward signal
Explain the differences between Supervised and Reinforcement Learning
- In SL, there’s an external “supervisor”, which has knowledge of the environment and who shares it with the agent to complete the task
- Both strategies use mappings between inputs and outputs, but in RL there is a reward function which acts as a feedback to the agent
- Supervised learning relies on labelled training data
Explain the differences between Unsupervised and Reinforcement Learning
- In UL, there is no feedback from the environment
- In UL, task is to find the underlying patterns rather than the mapping from input to output
What is a policy
It states what action the agent takes when in a particular state. Thus, it is a function that maps states to actions
Characteristics of RL
- no supervisor, only a reward signal
- feedback is delayed, not instantaneous
- time really matters
- Agent’s actions have immediate consequences
What is the difference between Fully and Partially Observable environments
With full observability, the agent directly observes environment state.
With partial observability, the agent indirectly observes the environment
What does expectimax search compute
the average score under optimal play (e.g. Stockfish)
Discuss the differences between model-based and model-free RL techniques
- Learning: MB learns an internal model of the environment, MF learns a policy directly from experiences
- MB uses the learned model for simulating and planning, MF adjusts policy based on observed rewards without an explicit model
Pros and cons of Model-based RL
- Pros: sample-efficient, effective in complex dynamics
- Cons: model learning can be challenging
Pros and cons of Model-Free RL
- Pros: Flexible, applicable to a wide range of problems.
- Cons: May require more interactions (computation), exploration challenges in complex environments.
What is collaborative filtering
Approach for making predictions about the preferences of a user by collecting information from many other users
Briefly describe content-based recommenders
They analyse item descriptions and metadata to identify items likely to be of interest to the target user
What are the differences between user-based and item-based collaborative filtering
- In user-based CF, personal tastes are correlated, so the goal is to find users who share the same tastes as the target user, and use their information to make predictions.
- In item-based CF, previous items of the target user are matched to similar items, predictions are made by combining those similar items
What are the differences between the Jaccard Index and Cosine similarity
JI is suitable for binary data, cosine similarity is used with real-valued data.