Game Playing AI Flashcards

The first case study

1
Q

What historical figure(s) contributed to early AI game playing systems in the 1950s?

A

Claude Shannon and Alan Turing worked on early chess-playing systems in the 1950s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What notable AI defeated a world champion in chess, and when?

A

Deep Blue, developed by IBM, defeated world chess champion Garry Kasparov in 1997.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What AI agent achieved human-level performance on Atari games in 2015?

A

DeepMind’s Deep Q Network (DQN) agent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are games considered interesting for AI research?

A

Games are difficult to solve and present challenges in decision-making, learning, and strategy, making them an ideal testing ground for AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Moravec’s Paradox, and how does it relate to game-playing AI?

A

Moravec’s Paradox states that tasks humans find difficult (like chess) are easy for AIs, while tasks humans find easy (like vision and movement) are hard for AIs. This influences the design of generalized game-playing AIs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some commercial motivations for game-playing AIs?

A

Game-playing AIs can be used for playtesting, saving time and money by automating game testing processes and allowing multiple play sessions simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What approach is most effective for creating game-playing AIs?

A

Reinforcement learning is most effective, as it allows the AI to learn by interacting with the game environment and improving based on rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three main components of a reinforcement learning algorithm in a game-playing AI?

A
  1. States: Input data like game pixels, score, and game-over status.
  2. Actions: Movements such as left, right, or doing nothing.
  3. Rewards: Points and game completion.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the role of OpenAI Gym in training game-playing AIs?

A

OpenAI Gym provides a unified interface for AIs to interact with multiple tasks, helping in training and testing game-playing agents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the ethical considerations involved in building game-playing AIs?

A

Concerns include potential job displacement in game testing and broader ethical issues related to AI surpassing human capabilities in decision-making and learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is an AI considered to have “beaten” a game?

A

When the AI consistently defeats human players and is publicly available for anyone to play against over a long period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of competitions in AI game playing research?

A

Competitions provide a way to compare different AIs on a standardized game, encouraging innovation and improving the quality of AI systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is reinforcement learning?

A

Reinforcement learning is how AI agents can learn what to do in the absence of labelled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the key components of a Markov Decision Process (MDP) used in game AI?

A

States, actions, rewards, and the state transition matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can a game like “Breakout” be described using a Markov Decision Process?

A

In “Breakout,” states are the frames (pixels), actions are moving the paddle left, right, or doing nothing, and rewards are the points earned for breaking bricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean when a problem is “stochastic” in the context of game AI?

A

Stochastic means there is a probabilistic element in the system. For example, the ball in “Breakout” could move in different directions from the same initial state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Q-learning, and why is it necessary?

A

Q-learning is a reinforcement learning algorithm that approximates the value function. It is necessary because in many cases, like Atari games, the complete state transition matrix and rewards are unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Q-function in Q-learning?

A

The Q-function approximates the value of taking an action in a given state while considering future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the Bellman equation used for in reinforcement learning?

A

The Bellman equation calculates the value of a given action in a state, considering immediate rewards and future discounted rewards. (See Week 2 Notes for the full equation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two main challenges addressed by the DQN agent architecture?

A
  1. The lack of a state transition matrix.
  2. The absence of an action policy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the replay buffer in a DQN agent, and what is its purpose?

A

The replay buffer stores the agent’s experiences (state, action, reward, next state). It helps the agent learn by revisiting past interactions with the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is epsilon-greedy exploration in the context of DQN agents?

A

Epsilon-greedy exploration is a method where the agent starts by taking random actions to explore the environment, then gradually shifts to using the learned Q-function to make decisions.

23
Q

What is the role of the loss function in training a DQN agent?

A

The loss function measures the difference between the current network’s output and a “proxy” target value derived from an older version of the network. It guides the network’s learning.

24
Q

How does a DQN agent process visual input from games like “Breakout”?

A

The DQN agent processes the game frames in greyscale, resizing them to 84x84 pixels, and uses three consecutive frames as input to understand movement.

25
Q

Why does the DQN agent use three consecutive frames as input instead of a single frame?

A

Using three frames helps the agent understand temporal changes, such as the direction of the ball’s movement in a game like “Breakout.”

26
Q

What is epsilon’s role in epsilon-greedy exploration, and how does it change over time?

A

Epsilon controls the balance between exploration (random actions) and exploitation (using the Q-function). Initially, epsilon is high (more exploration), but it decreases over time, favoring the Q-function’s decisions.

27
Q

Why do DQN agents use the Y channel (luminance) from an RGB frame for image processing?

A

The Y channel represents brightness and simplifies the input by converting the frame to greyscale, which reduces computational complexity while retaining important visual information.

28
Q

What problem arises from the way Atari games handle sprite rendering, and how does DQN handle it?

A

Atari games cause flickering by alternating sprite frames. DQN handles this by combining two game frames into a single input to provide a clearer visual representation.

29
Q

Why are convolutional layers important in reinforcement learning agents like DQNs?

A

Convolutional layers are crucial because they process image data efficiently, allowing agents to interpret visual input from games, such as pixel-based states.

30
Q

What does a convolution operation do to an image?

A

Convolution applies a filter to an image, modifying each pixel’s value based on its surrounding pixels to extract useful features like edges or patterns.

31
Q

What is a convolutional layer in a neural network?

A

A convolutional layer applies a set of trainable filters to input data (such as images), learning how to extract important features during training.

32
Q

What are the key properties of a convolutional layer?

A

Key properties include:
Filters: The number of filters applied in parallel.
Kernel size: The dimensions of each filter.
Stride: The step size used when sliding the filter over the input.

33
Q

How is the DQN agent architecture expressed in a working program?

A

The DQN agent architecture is expressed as a neural network with convolutional layers for processing visual inputs, followed by fully connected layers to make decisions. Input frames from the game are processed through the network to predict the Q-values for each action.

34
Q

What is the role of convolutional layers in a DQN network?

A

Convolutional layers in a DQN network extract features from the input game frames, such as object edges or motion, which help the agent understand the game environment.

35
Q

What does the input to the DQN neural network consist of?

A

The input consists of a series of game frames (e.g., 84x84 pixel images) that provide a historical context to the network. This helps the network learn the game dynamics and predict future states.

36
Q

How does the DQN network determine the best action to take in a game?

A

The network outputs Q-values for each possible action, and the agent selects the action with the highest Q-value, indicating the most promising move based on current knowledge.

37
Q

How can you evaluate the performance of a DQN agent?

A

DQN agent performance is typically evaluated by monitoring its progress over episodes, tracking cumulative rewards, and comparing its performance to a baseline or random agent.

38
Q

Why is it important to use multiple frames as input in DQN?

A

Multiple frames provide a sense of movement and temporal context, allowing the agent to understand object velocities and directions in games that involve dynamic environments.

39
Q

What are episodes and frames in the context of DQN training?

A

An episode refers to a complete run of the game from start to finish, while frames are individual visual inputs fed into the neural network during the agent’s decision-making process.

40
Q

Why is there a trend towards general AI?

A

The trend towards general AI is driven by the desire to create systems that can solve a wide variety of tasks without being specifically programmed for each one. General AI has the potential to transform industries by automating complex decision-making, improving productivity, and solving problems that require human-level reasoning.

41
Q

How might general AI impact society?

A

General AI could revolutionize industries by automating complex tasks, improving healthcare, enhancing scientific research, and personalizing education. However, it also poses ethical challenges like job displacement, inequality, privacy concerns, and the potential misuse of powerful AI systems.

42
Q

What is human-competitive AI in gaming?

A

Human-competitive AI refers to AI systems capable of outperforming human players in complex games like StarCraft II or Dota 2. These AIs use advanced techniques, such as deep reinforcement learning, to learn strategies that rival or surpass those of professional human players.

43
Q

What are some examples of human-competitive AI game players?

A

Notable examples include:

AlphaStar: DeepMind’s AI that defeated professional players in StarCraft II.
OpenAI Five: An AI developed by OpenAI that defeated the world champion team in Dota 2.
Agent57: A reinforcement learning AI that outperforms human benchmarks across multiple Atari games.

44
Q

How do human-competitive AIs gather experience in games?

A

Human-competitive AIs gather experience by interacting with the game environment repeatedly, storing their experiences in replay buffers, and using reinforcement learning techniques to improve decision-making over time. They learn by maximizing rewards, refining strategies through trial and error.

45
Q

How does fairness apply to AI and human competitions?

A

Fairness in AI vs. human competitions involves ensuring that both sides have equal opportunities. This includes factors such as equal access to training data, similar computational power, and comparable motor and perceptual abilities, to create a level playing field between AI and human competitors.

46
Q

What are the six dimensions of fairness in AI vs. human competitions?

A

The six dimensions are:

Perceptual: Equal input capabilities.
Motoric: Equal output capabilities.
Historic: Equal time spent in training.
Knowledge: Equal access to declarative knowledge.
Compute: Equal computational resources.
Common-sense: Equal understanding of the broader context outside the game.

47
Q

How have human-competitive AIs impacted professional human players?

A

Human-competitive AIs have influenced human players by introducing new strategies, shifting how games are played. In some cases, AI dominance has led professional players to retire, feeling that they cannot surpass the AI’s performance, as seen in Go after the advent of AlphaGo.

48
Q

How does AlphaStar outperform human players in StarCraft II?

A

AlphaStar outperforms human players by using deep reinforcement learning to gather large amounts of experience through self-play. It learns complex strategies over millions of iterations and can execute precise, high-speed decisions that surpass human reaction times.

49
Q

What was the impact of OpenAI Five on Dota 2?

A

OpenAI Five demonstrated AI’s ability to compete with top-tier human teams, defeating the world champions OG in back-to-back games. This achievement showed that AI could master a complex, real-time strategy game like Dota 2, which requires teamwork, planning, and fast decision-making.

50
Q

How do AIs like Agent57 improve their understanding of an environment?

A

AIs like Agent57 improve their understanding by learning from a wide variety of experiences in different environments, using techniques like distributed reinforcement learning and recurrent neural networks to generalize across multiple games and adapt to unseen situations.

51
Q

What were the goals of the Mario AI competition?

A

The Mario AI competition aimed to encourage the development of AI that could navigate and solve platformer games like Mario. Competitors designed AIs capable of playing levels with varying difficulty, using strategies ranging from simple rule-based approaches to deep learning techniques.

52
Q

How do advancements like DreamerV2 differ from traditional DQNs?

A

DreamerV2 uses world models that simulate the environment internally, allowing the agent to plan actions by predicting future states, unlike traditional DQNs that rely on direct experience and a replay buffer to improve their policies.

53
Q

Why is AI vs. human fairness still being debated in games like Dota 2 and StarCraft II?

A

Fairness is debated because AI systems often have advantages in terms of computational power, training time, and access to resources that humans do not. These disparities make it difficult to create a truly fair competition between AIs and humans in these games.

54
Q

What is Rainbow DQN, and how does it improve over traditional DQN?

A

Rainbow DQN combines several advanced techniques, such as double Q-learning, prioritized experience replay, and dueling networks, to enhance stability and learning efficiency, resulting in improved performance across multiple Atari games compared to traditional DQNs.