Week 6 Flashcards
For what type of data are convolutional neural networks (CNN) optimized?
CNNs are optimized for grid-like data, such as images for face recognition and inputs for playing Atari games.
What is natural language processing (NLP)?
NLP is a field of artificial intelligence that enables machines to understand, interpret, and respond to human language.
What are the components of a recurrent neural network (RNN)?
RNNs contain feed-forward loops that produce output and feedback loops that allow for recurrence processing, enabling the network to have a “memory” of past inputs.
What are the disadvantages of RNNs
- RNNs can be difficult to implement and tune
- RNNs inherently process data sequentially, which can lead to slower computation compared to models that can process data in parallel.
What is the main functionality of RNNs?
RNNs capture temporal dependencies and patterns by processing sequences of data, making them suitable for tasks like speech recognition or video analysis.
What is the hierarchical structure of CNNs?
CNNs have a hierarchical structure with convolutional layers that apply convolution to obtain feature maps and pooling layers that reduce size by downsampling, increasing abstraction up the hierarchy.
How do CNNs compare to traditional ANNs?
CNNs have increased efficiency compared to traditional ANNs and allow for deeper networks due to their specialized structure that can capture hierarchical patterns.
What neural networks does AlphaGo use?
AlphaGo uses two neural networks: the value network, which predicts the winner of the game from the current position, and the policy network, which suggests the next move to play.
What learning methods are combined in AlphaGo?
AlphaGo combines supervised learning, where it initially learns from human expert games, with reinforcement learning, where it improves by playing games against itself.
What is the first step in training AlphaGo?
The first step is to train the policy network with 30 million expert moves using supervised learning, achieving a prediction accuracy of expert moves of 57% in 3 milliseconds.
What is the second step in AlphaGo’s training?
The second step involves training a fast rollout policy which predicts expert moves
What is the third step in AlphaGo’s training?
The third step is to improve the policy network through policy gradient reinforcement learning (self-play), where the RL-policy network wins 80% against the SL-policy network and 85% against Monte Carlo Tree Search (MCTS) programs.
What is the fourth step in AlphaGo’s training?
The fourth step is to estimate the value function/network for position evaluation by generating a new dataset using the RL-policy network through self-play.
What type of training data does AlphaGo Zero use?
AlphaGo Zero uses no human data for training. It learns entirely from self-play, starting from random play and improving over time.
How does AlphaGo Zero’s performance compare to AlphaGo?
AlphaGo Zero is faster (100x) and more efficient (10x) than the original AlphaGo and defeated AlphaGo in 100 out of 100 games.