Game Playing Flashcards
Definition of games
zero-sum, discrete, finit, determinist, perfect information
zero-sum
one player’s gain is the other player’s loss; does not mean fair
perfect information
each player can see the complete game state; no simultaneous decisons
game playing as search
states - board configuration
actions - legal moves
initial state - starting board configuration
goal state - game over/terminal board configuration
Utility function
used to map each terminal state of the board to a score indicating the value of that outcome to the computer
Greedy Search Using an evaluation function
expand the search tree to the terminal states on each branch; evaluate the utility of each terminal board configuration; make the initial move that results in the board configurations with the maximum value
problem with greedy search using eval function
ignores what the opponent might do
Minimax principle
Assume both players play optimally; computer should choose maximizing moves (high utility), opponent = minimizing moves (low utility)
Minimax principle
computer chooses the best move considering both its move and the opponent’s optimal move
Minimax Propagation Up Tree
Start at leaves; assign value to parent - min if opponent’s move; max if computer’s move
General Minimax Algorithm
For each move done by computer:
perform DFS, stop at terminal states, evaluate each terminal state, propagate upwards the minimax values; choose the move at root with the maximum of the minimax values of its children
Time/space complexity of minimax algorithm
space O(bd) (DFS); time: O(b^d)
Static Evaluation function
impractical to search game space to all terminal states and apply utility function; SEF - use heuristics to estimate the value of non-terminal state
SBE used for
estimating how good the current board configuration is for the computer; reflects chances of winning from that node; must be easy to calculate
how is SBE calculated
one subtracts how good it is for the opponent from how good it is for the computer; SBE should agree with the Utility function when calculated at terminal nodes
Minimax Algorithm
MaxValues if (terminal state or depth limit); return SBE value of s else v = -infinity for each s in successors(s) v = max (v, Min-Values)
Alpha Beta Idea
At maximizing levels; keep track of highest SBE value seen so far in each none
At minimizing levels, lowest SBE value seen so far in tree below
Alpha Cutoff
At each min node, keep track of the minimum value return so far from its visited children, store in v; each time v is updated, check its value against the alpha value of all its Max node ancesters; if alpha >= v don’t visit any more of MIN node’s children
Beta Cutoff
at each max node, keep track of the maximum value returned so far from its visited children, store in v; each time v is updated at a max node, check its value against the beta value of all its MIN node ancestors; if v >= beta don’t vist any more of the current Max node’s children
cutoff pruning occurs when
at max node if alpha >= beta for some Min ancessor
at min node; if for some max node ancestor alpha >= beta
alpha
largest value from its MAX node ancestors in search tree
beta
smallest (best value) from its min node ancestors in search tree
v at max
largest value from its children so far
beta at max
comes from min node ancestors
alpha at min
comes from max node ancestors
prune at max value function
when v >= beta
prune at min value function
when alpha is greater than v
Time complexity of alpha-beta
In practice O(b^(d/2)) instead of O(b^d); same as having a branching factor of sqroot(b)
How to deal with limited time in game
use iterative deepening search - run alpha-beta search with DFS and increasing depth limit
How to avoid horizon effect
quiescence search
secondary search
quiescence search
when SBE value is frequently changing, look deeper than limit; look for point game “quiets down”
secondary search
find best move looking to depth d
look k steps beyond to verify that it still looks good
if it doesn’t, repeat step 2 for the next best move
book moves
build a database of opening moves, end games, studied configurations; if current state is in database, use it to determine next move/board evaluation
linear evaluation function
a weighted sum of features * weights, more important features get more weight
How to learn weights in linear evaluation function
play games; for every move or game, look at the error = true outcome - evaluation function
Non-deterministic game representation
game tree representation is extended to include “chance nodes”: computer moves -> chance nodes -> opponent moves
How to score in non-deterministic games
weight score by the probability that move occurs
use expected value for move, instead of using max or min, compute the average weighted by the probabilities of each child
expected value for move
compute average, weighted by the probabilities of each child; choose move with the highest expected value
non-deterministic games do what to branching factor
increase
lookahead in non-deterministic games
value diminishes - as depth increases, probability of reaching a given node decreases; alpha-beta pruning less effective
How to improve performance in games
reduce depth/breadth of search - use better SBEs (deep learning); sample possible moves instead of exploring all (use randomized exploration of search space)
Monte Carlo Tree Search
rely on repeated random sampling to obtain numerical results, stochasic simulation of game, game must be finite
Pure Monte Carlo Tree Search
for each possible legal move of current player, simulate k random games with playouts, count how many were wins, move with most wins is selected
playouts
select moves at random for both players until game over
exploitation
keep track of average wine rate for each child from previous searches; prefer child that has previously lead to more wins
exploration
allow for exploration of relatively univisited children (moves)
exploitation and exploration
combined to computer a score for each child, pick child with highest score at each successive node in search
Monte Carlo Tree Search Algorithm
Start at root, successively select best child nodes using scoring method until leaf node L reached, create and add best new child C of L; perform random playout from C, update score at C and all of c’s ancestors in search tree