Ai Final Study Guide PART 2 Flashcards

1
Q
  1. What is the primary advantage of using a Q-network over Q-tables in reinforcement learning? a) Q-networks require less memory storage b) Q-networks can handle continuous action spaces c) Q-networks provide faster convergence d) Q-networks are more interpretable
A

Answer: b) Q-networks can handle continuous action spaces Explanation: Q-networks can approximate Q-values for continuous action spaces, which is not feasible with Q-tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Which method in reinforcement learning aims to directly optimize the policy function? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
A

Answer: c) Policy Gradient methods Explanation: Policy Gradient methods directly optimize the policy function to maximize the expected reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. What does the actor represent in policy-based reinforcement learning? a) The Q-value function b) The probability distribution over actions given a state c) The loss function d) The gradient of the policy function
A

Answer: b) The probability distribution over actions given a state Explanation: The actor represents the policy function, which outputs the probability distribution over actions given a state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. What does R_? represent in reinforcement learning? a) The state-value function b) The expected reward obtained by an actor c) The Q-value function d) The loss function
A

Answer: b) The expected reward obtained by an actor Explanation: R_? represents the expected cumulative reward obtained by an actor under a particular policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What is the primary objective of the Monte Carlo method in reinforcement learning? a) To estimate the value of state-action pairs incrementally b) To update the Q-values based on experience at each time step c) To estimate the expected return from complete episodes/trajectories d) To optimize the policy using gradient ascent
A

Answer: c) To estimate the expected return from complete episodes/trajectories Explanation: The Monte Carlo method estimates the expected return by averaging the rewards obtained from complete episodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Which approach in reinforcement learning updates the value estimates incrementally at each step? a) Monte Carlo methods b) Temporal-Difference methods c) Policy Gradient methods d) Q-learning
A

Answer: b) Temporal-Difference methods Explanation: Temporal-Difference methods update the value estimates at each time step based on the current estimate and the observed reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What does the Q-function represent in Q-learning? a) The probability distribution over actions given a state b) The expected cumulative reward obtained from a state-action pair c) The value of a state under a particular policy d) The gradient of the policy function
A

Answer: b) The expected cumulative reward obtained from a state-action pair Explanation: The Q-function estimates the expected cumulative reward obtained by taking a specific action in a given state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Which technique is used to train a Q-network in reinforcement learning? a) Gradient ascent b) Experience replay c) Policy gradient d) Bellman equation
A

Answer: b) Experience replay Explanation: Experience replay involves training a Q-network using random minibatches of transitions stored in a replay memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. What problem does experience replay in reinforcement learning aim to solve? a) Correlated samples in training data b) Overfitting of the Q-network c) Exploration-exploitation trade-off d) Gradient vanishing problem
A

Answer: a) Correlated samples in training data Explanation: Experience replay helps to break the correlation between consecutive samples, improving the efficiency of training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. What does the Bellman equation describe in reinforcement learning? a) The update rule for Q-values in Q-learning b) The gradient of the policy function c) The expected return from complete episodes d) The process of experience replay
A

Answer: a) The update rule for Q-values in Q-learning Explanation: The Bellman equation describes how the Q-values should be updated based on the observed reward and the estimate of future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. Which approach in reinforcement learning is mainly used for processing closely related continuous events? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
A

Answer: c) Policy Gradient methods Explanation: Policy Gradient methods are suitable for processing continuous events where the entire trajectory is needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. In Q-learning, what does the term “exploration-exploitation trade-off” refer to? a) Balancing the exploration of new states with exploiting known information b) Balancing the loss function with the value function c) Balancing the Q-values with the state-action pairs d) Balancing the gradient descent with the gradient ascent
A

Answer: a) Balancing the exploration of new states with exploiting known information Explanation: Q-learning involves choosing between exploring new states to gather more information and exploiting known information to maximize reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. What is the primary objective of experience replay in reinforcement learning? a) To store the entire trajectory of a game b) To break the correlation between consecutive samples c) To optimize the policy function d) To update the Q-values based on reward signals
A

Answer: b) To break the correlation between consecutive samples Explanation: Experience replay helps to decorrelate the training samples, making learning more efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. What is the main limitation of using Q-tables in reinforcement learning? a) They are computationally expensive to update b) They cannot handle continuous action spaces c) They require prior knowledge of the environment dynamics d) They suffer from the curse of dimensionality
A

Answer: d) They suffer from the curse of dimensionality Explanation: Q-tables become impractical for large state spaces due to the exponential growth of the number of states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. Which method is used to estimate the value of state-action pairs through complete episodes? a) Q-learning b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
A

Answer: d) Monte Carlo methods Explanation: Monte Carlo methods estimate the value of state-action pairs by averaging the rewards obtained from complete episodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly