Minimax Algorithm Flashcards
What is the setup for the minimax adversarial game?
> Two players:
- Player = Max
- Opponent = Min
Deterministic: No randomness involved in the game (e.g. no dice rolls)
Zero-sum: the sum of gains for both player
Perfect information: The move for each opponent is observed by the other one
A move means one turn for each player
What are 4 common adversarial games?
> Chess
Go
Noughts and crosses
Checkers
What makes an adverserial game different from say a search algorithm?
The player (Max) does not have full control over the entire game so can not find the optimal path beforehand
How does minimax work?
The player (Max) needs to maximise his reward for each turn. The opponent (Min) need sto minimise Max's reward for each turn
When is the reward for minimax given? What is the reward called?
> At the end of the game
The reward is named utility
[Picture4]
For each node, how does minimax calculate the which route is the best?
> It probes each route the game may take.
For each game route the sum of all the utility values that could result by taking that route are calculated
This gives a weighting to each route to find the route with the highest chance of finind a solution
The weightings are called minimax vaues
[Picture5]
What is the issue when calculating the minimax value?
To traverse the entire game tree using a depth first strategy takes too long
What is the time complexity equation for minimax?
O(b^m)
What is the space complexity for minimax?
O(bm)
How does pruning successors work?
Instead of traversing the entire game tree, the minimax alroithm cleverly ignores routes where the worst case scenario for that route is worse than another already traversed route. This allows it to pick routes that have a better chance of winning.
For the below example, explain the logic:
[Picture 6]
a) Firstly, Max probes all routes of the initial option (B) to calculate the worst case scenario of picking that route
- At this stage, it uses pruning for all stages after B
- It finds that the first route has a worst case scenario of a reward of 3
b) Then it probes another route of the initial option (B)
to calculate the worst case scnario of that route.
c) After probing all the worst case scenarios if Max picked B, it can be certain that the worst case scenario of picking B is 3
d) It then moves onto the option C and after fining the worst case scenario as being 2, it can conclude that this is worse than picking B and so moves on
e) When probing D it finds a worst case scenario of 14 that is better than picking B. This means that picking D is still a potential option and so it continues to probe D
f) After probing D again and fining a worst case scenario of 5, it can see that this is still better than B and so it continues to probe D.
It finds that one of the routes has a worst case scenario of 2 which is worse than if it picked B
FINAL SOLUTION: The best method would be to pick B and it has a worst case scenario of 3
When checking a node what is the maths expression for the logic deciding wheather to prune a route or not?
α ≤ v ≤ β
v = Value of the node being probed