DL part 3 Flashcards
What is a policy in RL?
a strategy for an agent to determine the next action
What are value functions?
- Action - value function: Estimates the value of taking a specific action in a given state under a certain policy
- State - value function: Estimates the value of being in a particular state under a given policy.
What is the exploration-exploitation dilemma?
to strike a balance between exploiting actions that have shown high rewards in the past and exploring new actions to potentially discover even better ones.
What is exploitation? exploration?
- Exploitation: use the know good action
- Exploration: try out new action
What is a strategy used in RL to balance exploration and exploitation?
epsilon-greedy policy
- sample discrete action from the policy:
+with a probability of ϵ, the agent selects a random action to explore
+with a probability of 1−ϵ, it chooses the action with the highest estimated value based on past experiences.
What happens in the a full reinforcement learning problem, such as Markov Decision Processes (MDPs)?
- agent interacts with an environment → its actions influence not only immediate rewards but also future states and rewards.
- actions influence the state
- Rewards additionally depend on the state
- agent learns optimal behavior over time to maximize cumulative rewards