lecture 13 Flashcards by V.I.N E.S.H

What is reinforcement learning?

A learning framework where an agent interacts with an environment and learns from rewards.

How well did you know this?

Not at all

Perfectly

How does RL differ from supervised learning?

RL learns from interaction with the environment, while supervised learning relies on labeled data.

How well did you know this?

Not at all

Perfectly

What is the key objective of an RL agent?

To learn a policy that maximizes expected future rewards.

How well did you know this?

Not at all

Perfectly

What is a policy in RL?

A function that maps states to actions.

How well did you know this?

Not at all

Perfectly

What is a reward in RL?

A numerical value indicating the immediate benefit of taking an action.

How well did you know this?

Not at all

Perfectly

What is the environment in RL?

The external system with which the agent interacts.

How well did you know this?

Not at all

Perfectly

What is a Markov Decision Process (MDP)?

A mathematical framework for modeling decision-making in RL.

How well did you know this?

Not at all

Perfectly

What are the key components of an MDP?

States, actions, rewards, and transition probabilities.

How well did you know this?

Not at all

Perfectly

What is the assumption of the Markov property?

The next state depends only on the current state and action, not past states.

How well did you know this?

Not at all

Perfectly

What is an episodic RL task?

A task where learning consists of repeated episodes that end in a terminal state.

How well did you know this?

Not at all

Perfectly

What is a continuous RL task?

A task where interaction with the environment is ongoing without a clear termination.

How well did you know this?

Not at all

Perfectly

What is the exploration-exploitation tradeoff?

Balancing between exploring new actions and exploiting known good actions.

How well did you know this?

Not at all

Perfectly

What is a value function in RL?

A function that estimates the expected cumulative reward from a state.

How well did you know this?

Not at all

Perfectly

What is the difference between state-value and action-value functions?

State-value estimates future rewards for a state, while action-value estimates for a state-action pair.

How well did you know this?

Not at all

Perfectly

What is the Bellman equation?

A recursive equation that expresses the value of a state in terms of its expected rewards and future states.

How well did you know this?

Not at all

Perfectly

What is temporal difference (TD) learning?

Study These Flashcards

A method that updates value estimates based on observed rewards and predictions.

What is Q-learning?

Study These Flashcards

An off-policy RL algorithm that learns action values by iteratively updating Q-values.

What is the role of the Q-table in Q-learning?

Study These Flashcards

It stores the estimated reward values for state-action pairs.

What is deep Q-learning?

Study These Flashcards

A variation of Q-learning that uses a neural network to approximate the Q-function.

What is policy gradient learning?

Study These Flashcards

A method where policies are directly optimized using gradient ascent on expected rewards.

What is actor-critic learning?

Study These Flashcards

A hybrid RL method combining value-based and policy-based approaches.

What is reward shaping?

Study These Flashcards

A technique where additional rewards are provided to guide learning.

What is the credit assignment problem in RL?

Study These Flashcards

Determining which past actions contributed to an observed reward.

What is sparse reward in RL?

Study These Flashcards

When rewards are infrequent, making learning difficult.

What is an example of reinforcement learning in real-world applications?

Game playing (e.g., AlphaGo, DeepMind's Atari games).

What is the difference between model-free and model-based RL?

Model-free learns from experience without a model, while model-based plans using an environment model.

What is the purpose of experience replay in deep RL?

To store past experiences and use them for training, improving stability.

What is an advantage of reinforcement learning?

It can learn complex behaviors without explicit supervision.

What is a limitation of reinforcement learning?

It requires large amounts of data and can be sample inefficient.

lecture 13 Flashcards

(29 cards)