W4 Flashcards

Question 1

Q

AlphaGo uses Monte Carlo Tree Search in combination with deep reinforcement learning to evaluate moves.

Answer

A

Answer: True

Question 2

Q

Transfer learning allows AI systems to apply knowledge learned in one domain to new, unrelated domains without retraining.

Question 3

Q

Deep Blue relied heavily on machine learning to defeat human chess champions.

Answer

A

Answer: False
(Deep Blue primarily used brute-force search and hand-coded heuristics rather than machine learning.)

Question 4

Q

The credit assignment problem in reinforcement learning occurs when it is unclear which action led to a delayed reward.

Answer

A

Answer: True

Question 5

Q

Reinforcement learning is better suited for tasks with sparse and delayed rewards than supervised learning.

Answer

A

Answer: True

Question 6

Q

What is the primary difference between Q-learning and Deep Q-learning?
a) Q-learning uses a Q-table, while Deep Q-learning uses neural networks to approximate Q-values.
b) Q-learning is used for supervised learning, while Deep Q-learning is used for unsupervised learning.
c) Deep Q-learning does not require reinforcement learning principles.
d) Q-learning uses backpropagation, while Deep Q-learning does not.

Answer

A

Answer: a

Question 7

Q

Why is transfer learning considered crucial for advancing AI to general intelligence?
a) It eliminates the need for supervised learning entirely.
b) It allows AI to apply knowledge learned in one domain to another domain.
c) It guarantees perfect performance in all environments.
d) It enables AlphaGo to master Chess and Checkers without modifications.

Answer

A

Answer: b

Question 8

Q

What is a major challenge when applying reinforcement learning to real-world problems?
a) It requires extensive human-designed evaluation functions.
b) Rewards in real-world tasks are often delayed and sparse.
c) Reinforcement learning cannot perform random actions.
d) It always leads to overfitting in neural networks.

Answer

A

Answer: b

Question 9

Q

What is the main contribution of Monte Carlo Tree Search in AI systems like AlphaGo?
a) It provides a fixed evaluation function for decision-making.
b) It integrates supervised learning into reinforcement learning.
c) It uses statistical rollouts to evaluate moves probabilistically.
d) It eliminates the need for exploration during learning.

Answer

A

Answer: c

Question 10

Q

Which of the following highlights the limitation of AlphaGo in terms of general intelligence?
a) AlphaGo cannot use reinforcement learning techniques.
b) AlphaGo cannot transfer its knowledge of Go to other games or domains.
c) AlphaGo’s decision-making is not based on Monte Carlo simulations.
d) AlphaGo’s learning is completely unsupervised.

Answer

A

Answer: b

Question 11

Q

Explain the concept of reinforcement learning and how it differs from supervised learning. Provide examples from games like Breakout or AlphaGo to illustrate your answer.

Answer

A

Reinforcement Learning (RL):
RL involves an agent learning to make decisions by interacting with its environment, receiving feedback in the form of rewards or penalties, and optimizing its actions to maximize cumulative rewards.
It does not rely on labeled input-output pairs but learns through trial and error.
Example: AlphaGo:
AlphaGo uses reinforcement learning to improve its gameplay through self-play, receiving rewards based on win/loss outcomes rather than labeled examples of good moves.

Supervised Learning:
In supervised learning, a model learns from a dataset of labeled examples where the input-output relationship is explicitly defined.
The goal is to minimize prediction error on the training data and generalize to unseen data.
Example: Breakout:

In Deep Q-learning for Breakout, the system learns by playing the game repeatedly, improving its policy based on scores (rewards).
Unlike supervised learning, the AI isn’t told which actions are good; it learns this over many episodes.

Question 12

Q

Discuss the challenges of applying reinforcement learning to real-world tasks, using the dishwashing robot example. Why is it difficult for reinforcement learning to extend from games to these scenarios?

Answer

A

Challenges:

Sparse and Delayed Rewards: In real-world tasks, rewards are often not immediate or frequent (e.g., clean dishes after a full cycle).
Complex State Spaces: Real-world environments have far more variables than controlled games like Breakout.
Uncertainty: Real-world tasks involve unpredictable variables (e.g., types of dishes, water pressure, detergent levels).
High Cost of Failure: Errors in physical tasks can lead to damage or safety issues, unlike virtual mistakes in games.
Dishwashing Robot Example:

The robot must navigate complex actions like identifying dirty dishes, understanding different dish types, and handling fragile items.
Unlike games, the task lacks well-defined rules or consistent feedback, making it harder to train the robot effectively.

Question 13

Q

How does Monte Carlo Tree Search improve decision-making in games like Go, and why does it not rely on a fixed evaluation function? Explain its significance in AlphaGo’s performance.

Answer

A

Monte Carlo Tree Search (MCTS):

MCTS simulates many possible future game scenarios (rollouts) to gather statistical data on the likelihood of winning from different moves.
Instead of using a fixed evaluation function, MCTS relies on these statistics to guide decision-making.
Significance in AlphaGo:

MCTS allows AlphaGo to balance exploration (testing new moves) and exploitation (focusing on the best-known moves).
By combining MCTS with deep neural networks for move evaluation and rollout policy, AlphaGo achieved superhuman gameplay.

Question 14

Q

Compare and contrast the learning mechanisms of AlphaGo, Deep Blue, and Arthur Samuel’s Checkers Player. Highlight the role of reinforcement learning and self-play in these systems.

Answer

A

AlphaGo:

Uses reinforcement learning with self-play to iteratively improve its policy.
Combines deep neural networks with Monte Carlo Tree Search.
Relies on a mix of supervised learning and self-play to master the game of Go.
Deep Blue:

Relied on brute-force search and hand-coded chess heuristics.
Used specialized hardware for faster computation but did not use machine learning.
Played chess by evaluating millions of board positions rather than learning strategies.
Arthur Samuel’s Checkers Player:

Used an evaluation function to score board positions.
Improved through self-play, refining its strategies over time.
Foreshadowed modern reinforcement learning techniques.

Question 15

Q

Define the term “transfer learning” and discuss why it is essential for achieving general AI. How does the inability of current AI systems to transfer knowledge limit their application to broader domains?

Answer

A

Transfer Learning:

Transfer learning is the ability of a system to apply knowledge learned in one domain to another domain without starting from scratch.
For example, a system trained to identify cats could generalize this knowledge to recognize other animals.
Importance for General AI:

Transfer learning is crucial for building flexible systems capable of adapting to new tasks and environments.
Without it, AI remains narrow and task-specific, requiring retraining for each new problem.
Limitations of Current AI:

Systems like AlphaGo and Deep Blue are “idiot savants”—mastering one domain but incapable of generalizing to others.
This lack of generalization hinders applications in dynamic, real-world scenarios, such as robotics or medical diagnostics.

Question 16

Q

What are the main limitations of deep Q-learning in terms of abstraction and generalization? How can these limitations be addressed in future AI systems?

Answer

Study These Flashcards

A

Limitations:

Lack of Generalization: Deep Q-learning systems struggle to apply learned strategies across different environments.
Dependency on Rewards: They rely heavily on well-defined rewards, making them unsuitable for tasks without clear feedback.
Difficulty with Abstraction: They often fail to understand higher-level patterns or concepts beyond surface-level features.
Possible Solutions:

Incorporating Transfer Learning: To enable systems to generalize across tasks.
Hierarchical Reinforcement Learning: Using layers of abstraction to break tasks into subtasks.
Meta-Learning: Training systems to learn how to learn, enabling adaptability.

Question 17

Q

Based on Lecture 4, describe the concept of anthropomorphism in AI. Why is it problematic to attribute human-like reasoning to systems like AlphaGo?

Answer

Study These Flashcards

A

Anthropomorphism:

Anthropomorphism is the tendency to attribute human traits, emotions, or reasoning to non-human entities, including AI.
Why It’s Problematic:

AI systems like AlphaGo do not “think” or “reason” like humans; they process data and optimize strategies mathematically.
Misinterpreting their behavior as human-like can lead to overestimation of their abilities or unrealistic expectations.
For instance, AlphaGo’s decisions are based on patterns in training data and statistical calculations, not intuition or creativity.

Question 18

Q

Describe the tradeoff between exploration and exploitation in reinforcement learning. How does this tradeoff influence the performance of AI systems in uncertain environments?

Answer

Study These Flashcards

A

Exploration vs. Exploitation:

Exploration: Trying new actions to discover potentially better strategies.
Exploitation: Using known strategies to maximize immediate rewards.
Impact on Performance:

Too much exploitation can lead to suboptimal solutions if better strategies remain undiscovered.
Too much exploration can waste time and resources on unproductive actions.
Successful AI systems balance exploration and exploitation, often using techniques like epsilon-greedy strategies or Monte Carlo Tree Search to navigate this tradeof

Question 19

Q

What does the term “credit assignment problem” in reinforcement learning refer to?
a) Deciding which action is responsible for a received reward.
b) Assigning credit to human designers for successful AI behavior.
c) Allocating computing resources efficiently during training.
d) Distributing rewards equally across all actions taken by the agent.

Answer

Study These Flashcards

A

Answer: a

Question 20

Q

In the context of AlphaGo, what is the role of the Monte Carlo Tree Search (MCTS)?
a) It replaces reinforcement learning by using pre-programmed heuristics.
b) It uses statistical rollouts to evaluate potential moves and their likelihood of success.
c) It trains the neural network using supervised learning examples.
d) It eliminates the need for exploration by focusing entirely on exploitation.

Answer

Study These Flashcards

A

Answer: b

Question 21

Q

What is a major reason why reinforcement learning struggles in real-world tasks like robotics compared to games?
a) Real-world tasks lack a clear reward structure and are often ambiguous.
b) Real-world tasks have smaller state spaces than games.
c) Reinforcement learning cannot handle exploration vs. exploitation tradeoffs.
d) Games inherently require more computation than real-world tasks.

Answer

Study These Flashcards

A

Answer: a

Question 22

Q

Which of the following best explains why AlphaGo is called an “idiot savant”?
a) It is highly specialized in solving general intelligence tasks.
b) It excels at Go but cannot transfer its knowledge to other domains.
c) It uses human-like intuition to master multiple games.
d) It combines reinforcement learning and deep learning without Monte Carlo Tree Search.

Answer

Study These Flashcards

A

Answer: b

Question 23

Q

How does transfer learning differ from standard reinforcement learning?
a) Transfer learning relies on supervised examples, while reinforcement learning does not.
b) Transfer learning focuses on applying knowledge from one task to another, whereas reinforcement learning learns through interaction with the environment.
c) Transfer learning only works for image-based tasks, whereas reinforcement learning applies to all domains.
d) Transfer learning is only possible in symbolic AI, not subsymbolic systems like neural networks.

Answer

Study These Flashcards

A

Answer: b

Question 24

Q

Answer

Study These Flashcards

A

W4 Flashcards

(24 cards)