Week 2 - Search with A* & MCTS Flashcards
Intelligent agents
Perceive environments and act.
Fully/partial observable
Deterministic/Stochastic
Episodic/Sequential
Static/dynamic
Discrete/continuous
Static agents vs Learning
Static have all info at start.
Learning learn.
Breadth First
Expand all nodes before moving on to next layer
Uniform cost search
Expand node with lowest path cost.
Depth first search
Always expand the deepest node.
Could lower exploration though.
Bidirectional Search
Search from both the initial and goal state, hoping that they meet in the middle.
Informed (Heuristic) Search
Use an estimate for the cost of current state to the goal.
Eg a navigation heuristic from Loughborough to Edinburgh might be going north.
A* Search
Combines the cost to get to a node with estimated cost to goal.
f(n) = g(n) + h(n)
(g is cost from initial to current)
(h is estimate to goal)
A* Search Conditions for optimality
h(n) must be an admissible heuristic (never overstimate)
Consistency: estimated cost cannot increase
When is each (tree/graph) version of A* optimal?
Tree - if h(n) admissible
Graph - if h(n) is consistent
A* search always expand the… (cost)
lowest overall cost
MIN MAX
MAX step tries to maximise score
MIN tries to minimise max’s score.
DOUBLE CHECK UNDERSTANDING
Max tries to maximise the worst case outcome
Alpha beta pruning
Pruning (avoiding) branches that will not be explored.
Faster
MCTS
Monte Carlo Tree Search
MCTS:
Explores the tree using a stochastic process (Monte Carlo - casinos) when the tree is too large to explore - even with pruning.
MCTS: Four main phases
Selection
Expansion
Simulation
Backpropagation
MCTS: Selection Phase
Algorithm chooses a node to expand.
Upper Confidence Bound is used to balance exploration and exploitation
see slides for expression
MCTS: Expansion Phase
Node is expanded by adding child nodes.
Can be guided by heuristic or random.
MCTS: Simulation Phase
SImulate random game outcomes until terminal condition.
Allows to quickly reach the end of the game
MCTS: Backpropagation
Result is used to update statistics for the tree “backwards”.
Supervised Learning
Examples of correct behaviour provided.
Unsupervised Learning
Data is analysed to extract properties
Self-supervised Learning
Model learns to reconstruct part of the data so that correlations and strcutures are learnt.
Reinforcement Learning
System learns by rewards
Best first search
f(n) = h(n)