W4 Flashcards
- AlphaGo uses Monte Carlo Tree Search in combination with deep reinforcement learning to evaluate moves.
Answer: True
Transfer learning allows AI systems to apply knowledge learned in one domain to new, unrelated domains without retraining.
True
Deep Blue relied heavily on machine learning to defeat human chess champions.
Answer: False
(Deep Blue primarily used brute-force search and hand-coded heuristics rather than machine learning.)
The credit assignment problem in reinforcement learning occurs when it is unclear which action led to a delayed reward.
Answer: True
Reinforcement learning is better suited for tasks with sparse and delayed rewards than supervised learning.
Answer: True
- What is the primary difference between Q-learning and Deep Q-learning?
a) Q-learning uses a Q-table, while Deep Q-learning uses neural networks to approximate Q-values.
b) Q-learning is used for supervised learning, while Deep Q-learning is used for unsupervised learning.
c) Deep Q-learning does not require reinforcement learning principles.
d) Q-learning uses backpropagation, while Deep Q-learning does not.
Answer: a
- Why is transfer learning considered crucial for advancing AI to general intelligence?
a) It eliminates the need for supervised learning entirely.
b) It allows AI to apply knowledge learned in one domain to another domain.
c) It guarantees perfect performance in all environments.
d) It enables AlphaGo to master Chess and Checkers without modifications.
Answer: b
- What is a major challenge when applying reinforcement learning to real-world problems?
a) It requires extensive human-designed evaluation functions.
b) Rewards in real-world tasks are often delayed and sparse.
c) Reinforcement learning cannot perform random actions.
d) It always leads to overfitting in neural networks.
Answer: b
What is the main contribution of Monte Carlo Tree Search in AI systems like AlphaGo?
a) It provides a fixed evaluation function for decision-making.
b) It integrates supervised learning into reinforcement learning.
c) It uses statistical rollouts to evaluate moves probabilistically.
d) It eliminates the need for exploration during learning.
Answer: c
Which of the following highlights the limitation of AlphaGo in terms of general intelligence?
a) AlphaGo cannot use reinforcement learning techniques.
b) AlphaGo cannot transfer its knowledge of Go to other games or domains.
c) AlphaGo’s decision-making is not based on Monte Carlo simulations.
d) AlphaGo’s learning is completely unsupervised.
Answer: b
Explain the concept of reinforcement learning and how it differs from supervised learning. Provide examples from games like Breakout or AlphaGo to illustrate your answer.
Reinforcement Learning (RL):
RL involves an agent learning to make decisions by interacting with its environment, receiving feedback in the form of rewards or penalties, and optimizing its actions to maximize cumulative rewards.
It does not rely on labeled input-output pairs but learns through trial and error.
Supervised Learning:
In supervised learning, a model learns from a dataset of labeled examples where the input-output relationship is explicitly defined.
The goal is to minimize prediction error on the training data and generalize to unseen data.
Example: Breakout:
In Deep Q-learning for Breakout, the system learns by playing the game repeatedly, improving its policy based on scores (rewards).
Unlike supervised learning, the AI isn’t told which actions are good; it learns this over many episodes.
Example: AlphaGo:
AlphaGo uses reinforcement learning to improve its gameplay through self-play, receiving rewards based on win/loss outcomes rather than labeled examples of good moves.
Discuss the challenges of applying reinforcement learning to real-world tasks, using the dishwashing robot example. Why is it difficult for reinforcement learning to extend from games to these scenarios?
Challenges:
Sparse and Delayed Rewards: In real-world tasks, rewards are often not immediate or frequent (e.g., clean dishes after a full cycle).
Complex State Spaces: Real-world environments have far more variables than controlled games like Breakout.
Uncertainty: Real-world tasks involve unpredictable variables (e.g., types of dishes, water pressure, detergent levels).
High Cost of Failure: Errors in physical tasks can lead to damage or safety issues, unlike virtual mistakes in games.
Dishwashing Robot Example:
The robot must navigate complex actions like identifying dirty dishes, understanding different dish types, and handling fragile items.
Unlike games, the task lacks well-defined rules or consistent feedback, making it harder to train the robot effectively.
How does Monte Carlo Tree Search improve decision-making in games like Go, and why does it not rely on a fixed evaluation function? Explain its significance in AlphaGo’s performance.
Monte Carlo Tree Search (MCTS):
MCTS simulates many possible future game scenarios (rollouts) to gather statistical data on the likelihood of winning from different moves.
Instead of using a fixed evaluation function, MCTS relies on these statistics to guide decision-making.
Significance in AlphaGo:
MCTS allows AlphaGo to balance exploration (testing new moves) and exploitation (focusing on the best-known moves).
By combining MCTS with deep neural networks for move evaluation and rollout policy, AlphaGo achieved superhuman gameplay.
Compare and contrast the learning mechanisms of AlphaGo, Deep Blue, and Arthur Samuel’s Checkers Player. Highlight the role of reinforcement learning and self-play in these systems.
AlphaGo:
Uses reinforcement learning with self-play to iteratively improve its policy.
Combines deep neural networks with Monte Carlo Tree Search.
Relies on a mix of supervised learning and self-play to master the game of Go.
Deep Blue:
Relied on brute-force search and hand-coded chess heuristics.
Used specialized hardware for faster computation but did not use machine learning.
Played chess by evaluating millions of board positions rather than learning strategies.
Arthur Samuel’s Checkers Player:
Used an evaluation function to score board positions.
Improved through self-play, refining its strategies over time.
Foreshadowed modern reinforcement learning techniques.
Define the term “transfer learning” and discuss why it is essential for achieving general AI. How does the inability of current AI systems to transfer knowledge limit their application to broader domains?
Transfer Learning:
Transfer learning is the ability of a system to apply knowledge learned in one domain to another domain without starting from scratch.
For example, a system trained to identify cats could generalize this knowledge to recognize other animals.
Importance for General AI:
Transfer learning is crucial for building flexible systems capable of adapting to new tasks and environments.
Without it, AI remains narrow and task-specific, requiring retraining for each new problem.
Limitations of Current AI:
Systems like AlphaGo and Deep Blue are “idiot savants”—mastering one domain but incapable of generalizing to others.
This lack of generalization hinders applications in dynamic, real-world scenarios, such as robotics or medical diagnostics.