Deep Learning Flashcards
Deep Learning
deep learning wins over traditional machine
learning
image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bio-informatics, natural language processing (NLP), cybersecurity
DL areas
Deep Neural Network (DNN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN) : Long Short Term Memory (LSTM) & Gated Recurrent Units (GRU),
Auto-Encoder (AE),
Deep Belief Network (DBN),
the Generative Adversarial Network (GAN),
Deep Reinforcement Learning (DRL)
DL
Deep Learning (DL) : ML : Neural Networks (NN)
DL approaches
Like machine learning, deep learning approaches can be categorized as follows:
supervised,
semi-supervised or partially supervised,
unsupervised
Reinforcement Learning (RL) or Deep RL (DRL)
Supervised learning
uses labeled data
environment has a set of inputs and corresponding outputs(𝑥𝑡, 𝑦𝑡)~𝜌.
input xt, the intelligent agent predicts 𝑦̂𝑡 = 𝑓(𝑥𝑡),
the agent will receive a loss value 𝑙(𝑦𝑡, 𝑦̂𝑡).
The agent will then iteratively modify the network parameters for better
approximation of the desired outputs.
After successful training,
the agent will be able to get the correct answers to questions
from the environment. There are different supervised learning
approaches for deep leaning including Deep Neural Networks
(DNN), Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN) including Long Short Term Memory
(LSTM), and Gated Recurrent Units (GRU).
These networks
are described in Sections 2, 3, 4, and 5 respectively.
- forward propagation algorithm
code to do forward propagation (prediction) for NN
1st input: how many accounts they have
2nd input: how many children they have
model will predict how many transactions the user makes in the next year.
input_data
weights are available in a dictionary called weights
array of weights for the 1st node in the hidden layer are in weights[‘node_0’]
array of weights for the 2nd node in the hidden layer are in weights[‘node_1’]
GluonTS
toolkit for building time series models based on
deep learning and
probabilistic modeling techniques
https://arxiv.org/pdf/1906.05264.pdf
- Rectified Linear Activation Function
“activation function” is a function applied at each node.
It converts the node’s input into some output.
The rectified linear activation function (called ReLU) has been shown to lead to very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and the input if the input is positive.
52 transactions
- predict_with_network()
predict_with_network() will generate predictions for multiple data observations, which are pre-loaded as input_data. As before, weights are also pre-loaded. In addition, the relu() function you defined in the previous exercise has been pre-loaded.
- Forward propagation in a deeper network
You now have a model with 2 hidden layers.
The values for an input data point are shown inside the input nodes.
The weights are shown on the edges/lines.
What prediction would this model make on this data point?
Assume the activation function at each node is the identity function.
That is, each node’s output will be the same as its input.
So the value of the bottom node in the first hidden layer is -1, and not 0, as it would be if the ReLU activation function was used.
- Multi-layer neural networks
In this exercise, you’ll write code to do forward propagation for a neural network with 2 hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1. Their weights are pre-loaded as weights[‘node_0_0’] and weights[‘node_0_1’] respectively.
The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights[‘node_1_0’] and weights[‘node_1_1’] respectively.
We then create a model output from the hidden nodes using weights pre-loaded as weights[‘output’]
Representations are learned
How are the weights that determine the features/interactions in Neural Networks created?
model training process sets them to optimize predictive accuracy.
Levels of representation
Which layers of a model capture more complex or “higher level” interactions?
The last layers capture the most complex interactions.
Calculating model errors
What is the error (predicted - actual) for the following network when the input data is [3, 2] and the actual value of the target (what you are trying to predict) is 5?
The network generates a prediction of 16, which results in an error of 11
Understanding how weights change model accuracy
Imagine you have to make a prediction for a single data point. The actual value of the target is 7.
The weight going from node_0 to the output is 2, as shown below.
If you increased it slightly, changing it to 2.01, would the predictions become more accurate, less accurate, or stay the same?
Increasing the weight to 2.01 would increase the resulting error from 9 to 9.08, making the predictions less accurate.
increasing the weight will lead to a greater or smaller error
The value at node_0 is not 0, so increasing or decreasing the weight going from it to the output will have an affect on the accuracy of the predictions.
Coding how weight changes affect accuracy
02 01
change weights in a real network and see how they affect model accuracy!
Its weights have been pre-loaded as weights_0.
Your task in this exercise is to update a single weight in weights_0 to create weights_1,
which gives a perfect prediction (in which the predicted value is equal to target_actual: 3)
You’ll use the predict_with_network() function,
which takes an array of data as the first argument,
and weights as the second argument.
0202 Scaling up to multiple data points
input_data is a list of arrays. Each item in that list contains the data to make a single prediction. target_actuals is a list of numbers. Each item in that list is the actual value we are trying to predict.
In this exercise, you’ll use the mean_squared_error() function from sklearn.metrics. It takes the true values and the predicted values as arguments.
You’ll also use the preloaded predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.
different weights ~ different accuracies on a single prediction
to measure model accuracy on many points
You’ll now write code to compare model accuracies for two different sets of weights, which have been stored as weights_0 and weights_1.
0203
Calculating slopes
use this slope to improve the weights of the model!
You’re now going to practice calculating slopes. When plotting the mean-squared error loss function against predictions, the slope is 2 * x * (y-xb), or 2 * input_data * error. Note that x and b may have multiple numbers (x is a vector for each data point, and b is a vector). In this case, the output will also be a vector, which is exactly what you want.
0204
Improving model weights
use those slopes to improve your model. If you add the slopes to your weights, you will move in the right direction. However, it’s possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.
The weights have been pre-loaded as weights, the actual value of the target as target, and the input data as input_data. The predictions from the initial weights are stored as preds
0205
Making multiple updates to weights
mean squared error decreases as the number of iterations go up
You’re now going to make multiple updates so you can dramatically improve your model weights, and see how the predictions improve with each update.
To keep your code clean, there is a pre-loaded get_slope() function that takes input_data, target, and weights as arguments. There is also a get_mse() function that takes the same arguments. The input_data, target, and weights have been pre-loaded.
This network does not have any hidden layers, and it goes directly from the input (with 3 nodes) to an output node. Note that weights is a single array.
We have also pre-loaded matplotlib.pyplot, and the error history will be plotted after you have done your gradient descent steps.
backpropagation
The relationship between forward and backward propagation
If you have gone through 4 iterations of calculating slopes (using backward propagation) and then updated weights, how many times must you have done forward propagation?
0 You cannot do backward propagation without having done forward propagation.
4 Each time you generate predictions using forward propagation, you update the weights using backward propagation.
You cannot do forward propagation more times than backward propagation.
Thinking about backward propagation
If your predictions were all exactly right, and your errors were all exactly 0,
the slope of the loss function with respect to your predictions would also be 0.
In that circumstance, which of the following statements would be correct?
the updates to all weights in the network would indeed also be 0.
A round of backpropagation
In the network shown below, we have done forward propagation, and node values calculated as part of forward propagation are shown in white. The weights are shown in black. Layers after the question mark show the slopes calculated as part of back-prop, rather than the forward-prop values. Those slope values are shown in purple.
This network again uses the ReLU activation function, so the slope of the activation function is 1 for any node receiving a positive value as input. Assume the node being examined had a positive value (so the activation function’s slope is 1).
The slope needed to update this weight is indeed 6. You’re now ready to start building deep learning models with keras!
0300
Understanding your data
You will soon start building models in Keras to predict wages based on various professional and demographic factors. Before you start building a model, it’s good to understand your data by performing some exploratory analysis.
The data is pre-loaded into a pandas DataFrame called df. Use the .head() and .describe() methods in the IPython Shell for a quick overview of the DataFrame.
The target variable you’ll be predicting is wage_per_hour. Some of the predictor variables are binary indicators, where a value of 1 represents True, and 0 represents False.
Of the 9 predictor variables in the DataFrame, how many are binary indicators? The min and max values as shown by .describe() will be informative here. How many binary indicator predictors are there?
There are 6 binary indicators.