DL part 3 Flashcards

Question 1

Q

What is a policy in RL?

Answer

A

a strategy for an agent to determine the next action

Question 2

Q

What are value functions?

Answer

A

Action - value function: Estimates the value of taking a specific action in a given state under a certain policy
State - value function: Estimates the value of being in a particular state under a given policy.

Question 3

Q

What is the exploration-exploitation dilemma?

Answer

A

to strike a balance between exploiting actions that have shown high rewards in the past and exploring new actions to potentially discover even better ones.

Question 4

Q

What is exploitation? exploration?

Answer

A

Exploitation: use the know good action
Exploration: try out new action

Question 5

Q

What is a strategy used in RL to balance exploration and exploitation?

Answer

A

epsilon-greedy policy
- sample discrete action from the policy:

+with a probability of ϵ, the agent selects a random action to explore

+with a probability of 1−ϵ, it chooses the action with the highest estimated value based on past experiences.

Question 6

Q

What happens in the a full reinforcement learning problem, such as Markov Decision Processes (MDPs)?

Answer

A

agent interacts with an environment → its actions influence not only immediate rewards but also future states and rewards.
actions influence the state
Rewards additionally depend on the state
agent learns optimal behavior over time to maximize cumulative rewards

Question 7

Q

DL part 3 Flashcards

(7 cards)