Chapter 9: Monte Carlo Tree Search, Current AI, Timeline Overview Flashcards

1
Q

What are the downsides of minimax, and how can Monte Carlo methods be used for tree search?

A

Downsides of Minimax:
- Trees can get intractably big
- Large branching factor means minimax cannot search very deep
- Evaluation function may fail to capture intricacies of the problem.

In these cases, some kind of generic estimation technique could be useful. We can incorporate Monte Carlo in tree search by implementing Monte Carlo Tree Search. This works by simulating games from a particular node and using the results from these simulations to evaluate game states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Monte Carlo Tree Search (MCTS), including the four general steps of the algorithm.

A

Monte Carlo Tree Search is a any-time best-first search algorithm used in game-playing AI. The algorithm consists out of these four steps:
- Selection: the algorithm selects a new node for evaluation.
- Expansion: The algorithm expands new leaf nodes to the tree
- Simulation: The algorithm performs a random playout (sampling) from a particular leaf node.
- Backpropagation: The algorithm propagates the results of the random playout to the nodes in the path to the root node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the two main variables that a node stores within the tree in the case of MCTS.

A

The number of times the node was visited: n(v)
Total score obtained in the node q(v). Here the sign depends on which player is to move in v.

v is the current node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does MCTS select a node for evaluation?

A

The selection procedure starts from the root of the tree and the aim is to keep moving to a child until you reach a leaf in the tree. The way the algorithm selects a child is by maximizing the Upper Confidence Bound for Trees (UCT):
uct(w) = (q(w) / n(w))+ c * sqrt(ln(n(v)) / n(w))

q(w) / n(w) is the average reward of a (child) node
n(v) is the parent of n(w)
Parameter c > 0 determines exploration/exploitation trade-off.
Setting c = 2 is common choice, but is problem-dependent in general.

The child of the root with the maximum UCT value is selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain how the search tree is expanded in the MCTS algorithm.

A

The approach for expanding the game-tree in MCTS involves four steps, assuming reached n(v) is a leaf-node:

  1. If a leaf-node v is a terminal state, move to the Backpropagation phase
  2. If n(v) = 0, move to Simulation phase with node v
  3. If n(v) > 0, create new child nodes for the node (v) by adding states that can be legally reached from the current state of node (v)
  4. Move to the Simulation phase with a child of the node (v).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how MCTS uses sampling to evaluate game states.

A

MCTS uses a simulation phase in order to obtain information on a game-state. In the simulation phase the algorithm performs a random-playout, in which it performs random (legal) moves until the game is finished. It will keep track of the score during these play-outs, and it will determine the score difference at the end. The information about who won will then be backpropagated up the tree.

It is important to note that the moves performed during the random play-out are not added to the game-tree!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain what is meant with backpropagation in MCTS.

A

After performing Simulation in leaf v, we update all nodes (including v) in the path to the root node. Each node is updated as follows:

Update n(w) = n(w) + 1
Update q(w) = q(w) + score difference (or who won, depending on the metric of choice)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is an actual move selected in MCTS?

A

Max child = The child of the root that has the maximum value of q(w) / n(w) (average win rate or score difference)

Robust child = Select child w of root v to maximise n(w) (number of visits).

The robust child selection method implies that the child with the highest n(w) should be a node that has proven to be a reliable positive game-state because of the UCT selection mechanism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What were the main consequences of increased computing power (due to the introduction of more general workstations)?

A
  • “Old” methods were finally getting the power they needed (e.g. kasparov vs Deep blue in chess)
  • Approximate expensive methods were becoming “good enough” (e.g. monte carlo methods and metaheuristics)
  • Existing methods were becoming practical enough to embed in applications. (e.g. expert systems are now just small features embedded in Microsoft excel)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which areas advanced with the increase in computing power and the shift to more approximate solution?

A
  • Machine learning (Decision trees, SVMs, regression)
  • Combinatorial problems and optimization (planning, scheduling)
  • Reason under uncertainty (Bayesian networks and statistics)
  • Game playing (chess, backgammon, checkers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name several high-profile successes as part of the resurgence of AI:

A

2011: IBM’s Watson defeats reigning Jeopardy! champions
2012: Google trains neural nets that “recognizes cats on YouTube”
2011-2014: Apple’s Siri, Google’s Google Now, Microsoft Cortana bring natural language AI to people’s smartphones
2016: Google DeepMind’s AlphaGo beats Go champion Lee Sedol.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What lead to the “deep learning revolution”?

A
  • GPU-accelerated neural nets enable practical deep learning
  • Specialized neural net topologies successful on certain tasks (e.g. convolution neural nets for images and video, LSTM networks for handwriting, speech and temporal data, transformers for NLP)
  • Fueled by the availability of big data
  • Combination with reinforcement learning breakthrough in dynamic setting. (Game playing remains relevant as “microworld”, also in e.g. robots, autonomous vehicle control, etc.

Keywords: GPU, Specialized, Big Data, Dynamics, Reinforcement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly