Machine Learning in Games (part of exam 2/3) Flashcards
What applications for Reinforcement Learning are there in Games?
- Tic-Tac-Toe: MANACE (Michie 1963)
- Backgammon: TD-Gammon (Tesauro 1995)
- Chess: KnightCap (Baxter et al 2000)
Apart from in Games, in what other areas is Reinforcement Learning being applied?
- Elevator Dispatching
- Robot Control
- Job Scheduling
How did MENACE (Michie 1963) learn to play Tic-Tac-Toe?
287 matchboxes (one for each position in the game) and beads in nine different colors (one color per square) were used to represent the game and the symbols positions.
Every turn the matchbox of the current position was selected and one of the beads was randomly chosen. The bead would instruct where the next symbol was supposed to be placed.
If the game was lost, the beads that had been chosen over the course of the game, were eliminated from the setting (negative reinforcement).
If the game was a draw, the beads were put back into the matchboxes.
If the game was won, double of the same colored beads were put back into the matchboxes (positive reinforcement).
This results in an increased likelihood that a successful move will be tried again.
What is the Credit Assignment Problem and how can it be solved?
In games that use reinforcement learning, it isn’t until the end of the game that the learner knows if he has won or lost and he doesn’t know which move(s) are responsible for the win/loss (Delayed Reward).
This can be solved by rewarding/penalizing all moves of the game. Over many games this will lead to bad moves rarely receiving a positive feedback.
What were TD-Gammons improvements over MENACE?
- faster convergence because of Temporal Difference Learning
- uses neural networks: allows generalization
- use of positional characteristics as features
What were KnightCaps improvements over TD-Gammon?
- integration of Temporal Difference Learning with deep searches
- self-play training was replaced with training by playing against various partners on the internet
Why is Minimax or Alpha-Beta not applicable to games of chance?
The players can’t directly maximize their gains because they don’t know what the opponant’s chance-element will yield. They don’t know what moves the opponant can legally make until after they have finished their own move.
What search method is used for games of chance?
standard game trees extended with chance nodes
In games of chance, what is the expected value and how is it calculated?
At each chance node an average of the expected value of the chance-element is calculated.
expected value (n) = sum of (probability P * outcome)
example: roll of a die
* every outcome has a chance of 1/6 –> probability P
* the number you roll –> outcome
so the expected value of a dice roll would be:
1/6 * 1 + 1/6 * 2 + 1/6 * 3 + 1/6 * 4 + 1/6 * 5 + 1/6 * 6 = 3.5
In games of chance, how can the expected value be calculated if the exact probability of the outcomes is not known?
Then the computation of the expected value is replaced with the average outcome over a large number of random games.
The more random games are played, the better the approximation of the value.
What is Simulation Search?
When the complete tree is not searchable, it’s possible to limit the breadth of the search tree: Only a few samples are searched but those are searched to the full depth.
What are some examples of Simulation Search in games?
- roll-out analysis in Backgammon: play a large number of games from the same position, each game has a different dice roll
- Scrabble: draw fro the bag of remaining tiles, put the tiles back in, draw again. repeat until you’ve reached a big enough sample size.
In a Simulation Search, what is an Algorithm Sketch?
Estimate the expected value of each move by counting the number of wins in a series of completed games.
What is the Monte-Carlo Search?
It’s an extreme case of Simulation Search, where a lot of games are played completely randomly. Both players choose their moves randomly, then the average score of these games is calculated.
After the training phase, the machine makes the moves that have the highest average scores.
True or False: the Monte-Carlo Search can’t be used in deterministic games, only in games of chance.
False. It has been used with some success in GO.