DL part 3 Flashcards

1
Q

What is a policy in RL?

A

a strategy for an agent to determine the next action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are value functions?

A
  • Action - value function: Estimates the value of taking a specific action in a given state under a certain policy
  • State - value function: Estimates the value of being in a particular state under a given policy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the exploration-exploitation dilemma?

A

to strike a balance between exploiting actions that have shown high rewards in the past and exploring new actions to potentially discover even better ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is exploitation? exploration?

A
  • Exploitation: use the know good action
  • Exploration: try out new action
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a strategy used in RL to balance exploration and exploitation?

A

epsilon-greedy policy
- sample discrete action from the policy:

+with a probability of ϵ, the agent selects a random action to explore

+with a probability of 1−ϵ, it chooses the action with the highest estimated value based on past experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens in the a full reinforcement learning problem, such as Markov Decision Processes (MDPs)?

A
  • agent interacts with an environment → its actions influence not only immediate rewards but also future states and rewards.
  • actions influence the state
  • Rewards additionally depend on the state
  • agent learns optimal behavior over time to maximize cumulative rewards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly