Deep Learning Flashcards

Deep Learning

1
Q

deep learning wins over traditional machine

learning

A
image processing,
computer vision,
speech recognition,
machine translation,
art,
medical imaging,
medical information processing,
robotics and control,
bio-informatics,
natural language processing (NLP),
cybersecurity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DL areas

A

Deep Neural Network (DNN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN) : Long Short Term Memory (LSTM) & Gated Recurrent Units (GRU),
Auto-Encoder (AE),
Deep Belief Network (DBN),
the Generative Adversarial Network (GAN),
Deep Reinforcement Learning (DRL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DL

A

Deep Learning (DL) : ML : Neural Networks (NN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DL approaches

A

Like machine learning, deep learning approaches can be categorized as follows:
supervised,
semi-supervised or partially supervised,
unsupervised

Reinforcement Learning (RL) or Deep RL (DRL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Supervised learning

A

uses labeled data
environment has a set of inputs and corresponding outputs(𝑥𝑡, 𝑦𝑡)~𝜌.
input xt, the intelligent agent predicts 𝑦̂𝑡 = 𝑓(𝑥𝑡),
the agent will receive a loss value 𝑙(𝑦𝑡, 𝑦̂𝑡).

The agent will then iteratively modify the network parameters for better
approximation of the desired outputs.

After successful training,
the agent will be able to get the correct answers to questions
from the environment. There are different supervised learning
approaches for deep leaning including Deep Neural Networks
(DNN), Convolutional Neural Networks (CNN), Recurrent
Neural Networks (RNN) including Long Short Term Memory
(LSTM), and Gated Recurrent Units (GRU).
These networks
are described in Sections 2, 3, 4, and 5 respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. forward propagation algorithm
A
code to do
forward propagation (prediction) for NN

1st input: how many accounts they have
2nd input: how many children they have
model will predict how many transactions the user makes in the next year.

input_data
weights are available in a dictionary called weights

array of weights for the 1st node in the hidden layer are in weights[‘node_0’]
array of weights for the 2nd node in the hidden layer are in weights[‘node_1’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GluonTS

A

toolkit for building time series models based on
deep learning and
probabilistic modeling techniques
https://arxiv.org/pdf/1906.05264.pdf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Rectified Linear Activation Function
A

“activation function” is a function applied at each node.
It converts the node’s input into some output.

The rectified linear activation function (called ReLU) has been shown to lead to very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and the input if the input is positive.

52 transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. predict_with_network()
A

predict_with_network() will generate predictions for multiple data observations, which are pre-loaded as input_data. As before, weights are also pre-loaded. In addition, the relu() function you defined in the previous exercise has been pre-loaded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. Forward propagation in a deeper network
A

You now have a model with 2 hidden layers.

The values for an input data point are shown inside the input nodes.

The weights are shown on the edges/lines.
What prediction would this model make on this data point?

Assume the activation function at each node is the identity function.

That is, each node’s output will be the same as its input.

So the value of the bottom node in the first hidden layer is -1, and not 0, as it would be if the ReLU activation function was used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. Multi-layer neural networks
A

In this exercise, you’ll write code to do forward propagation for a neural network with 2 hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1. Their weights are pre-loaded as weights[‘node_0_0’] and weights[‘node_0_1’] respectively.

The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights[‘node_1_0’] and weights[‘node_1_1’] respectively.

We then create a model output from the hidden nodes using weights pre-loaded as weights[‘output’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Representations are learned

How are the weights that determine the features/interactions in Neural Networks created?

A

model training process sets them to optimize predictive accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Levels of representation

Which layers of a model capture more complex or “higher level” interactions?

A

The last layers capture the most complex interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculating model errors
What is the error (predicted - actual) for the following network when the input data is [3, 2] and the actual value of the target (what you are trying to predict) is 5?

A

The network generates a prediction of 16, which results in an error of 11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Understanding how weights change model accuracy

Imagine you have to make a prediction for a single data point. The actual value of the target is 7.

The weight going from node_0 to the output is 2, as shown below.

If you increased it slightly, changing it to 2.01, would the predictions become more accurate, less accurate, or stay the same?

A

Increasing the weight to 2.01 would increase the resulting error from 9 to 9.08, making the predictions less accurate.

increasing the weight will lead to a greater or smaller error

The value at node_0 is not 0, so increasing or decreasing the weight going from it to the output will have an affect on the accuracy of the predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Coding how weight changes affect accuracy

02 01

A

change weights in a real network and see how they affect model accuracy!

Its weights have been pre-loaded as weights_0.
Your task in this exercise is to update a single weight in weights_0 to create weights_1,
which gives a perfect prediction (in which the predicted value is equal to target_actual: 3)

You’ll use the predict_with_network() function,
which takes an array of data as the first argument,
and weights as the second argument.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

0202 Scaling up to multiple data points

input_data is a list of arrays. Each item in that list contains the data to make a single prediction. target_actuals is a list of numbers. Each item in that list is the actual value we are trying to predict.

In this exercise, you’ll use the mean_squared_error() function from sklearn.metrics. It takes the true values and the predicted values as arguments.

You’ll also use the preloaded predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.

A

different weights ~ different accuracies on a single prediction
to measure model accuracy on many points

You’ll now write code to compare model accuracies for two different sets of weights, which have been stored as weights_0 and weights_1.

18
Q

0203

Calculating slopes

A

use this slope to improve the weights of the model!
You’re now going to practice calculating slopes. When plotting the mean-squared error loss function against predictions, the slope is 2 * x * (y-xb), or 2 * input_data * error. Note that x and b may have multiple numbers (x is a vector for each data point, and b is a vector). In this case, the output will also be a vector, which is exactly what you want.

19
Q

0204

Improving model weights

A

use those slopes to improve your model. If you add the slopes to your weights, you will move in the right direction. However, it’s possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.

The weights have been pre-loaded as weights, the actual value of the target as target, and the input data as input_data. The predictions from the initial weights are stored as preds

20
Q

0205

Making multiple updates to weights

A

mean squared error decreases as the number of iterations go up

You’re now going to make multiple updates so you can dramatically improve your model weights, and see how the predictions improve with each update.

To keep your code clean, there is a pre-loaded get_slope() function that takes input_data, target, and weights as arguments. There is also a get_mse() function that takes the same arguments. The input_data, target, and weights have been pre-loaded.

This network does not have any hidden layers, and it goes directly from the input (with 3 nodes) to an output node. Note that weights is a single array.

We have also pre-loaded matplotlib.pyplot, and the error history will be plotted after you have done your gradient descent steps.

21
Q

backpropagation

The relationship between forward and backward propagation
If you have gone through 4 iterations of calculating slopes (using backward propagation) and then updated weights, how many times must you have done forward propagation?

A

0 You cannot do backward propagation without having done forward propagation.

4 Each time you generate predictions using forward propagation, you update the weights using backward propagation.

You cannot do forward propagation more times than backward propagation.

22
Q

Thinking about backward propagation

If your predictions were all exactly right, and your errors were all exactly 0,
the slope of the loss function with respect to your predictions would also be 0.
In that circumstance, which of the following statements would be correct?

A

the updates to all weights in the network would indeed also be 0.

23
Q

A round of backpropagation
In the network shown below, we have done forward propagation, and node values calculated as part of forward propagation are shown in white. The weights are shown in black. Layers after the question mark show the slopes calculated as part of back-prop, rather than the forward-prop values. Those slope values are shown in purple.

This network again uses the ReLU activation function, so the slope of the activation function is 1 for any node receiving a positive value as input. Assume the node being examined had a positive value (so the activation function’s slope is 1).

A

The slope needed to update this weight is indeed 6. You’re now ready to start building deep learning models with keras!

24
Q

0300
Understanding your data
You will soon start building models in Keras to predict wages based on various professional and demographic factors. Before you start building a model, it’s good to understand your data by performing some exploratory analysis.

The data is pre-loaded into a pandas DataFrame called df. Use the .head() and .describe() methods in the IPython Shell for a quick overview of the DataFrame.

The target variable you’ll be predicting is wage_per_hour. Some of the predictor variables are binary indicators, where a value of 1 represents True, and 0 represents False.

Of the 9 predictor variables in the DataFrame, how many are binary indicators? The min and max values as shown by .describe() will be informative here. How many binary indicator predictors are there?

A

There are 6 binary indicators.

25
Q

0301 Specifying a model

A

Now you’ll get to work with your first model in Keras, and will immediately be able to run more complex neural network models on larger datasets compared to the first two chapters.

To start, you’ll take the skeleton of a neural network and add a hidden layer and an output layer. You’ll then fit that model and see Keras do the optimization so your model continually gets better.

As a start, you’ll predict workers wages based on characteristics like their industry, education and level of experience. You can find the dataset in a pandas dataframe called df. For convenience, everything in df except for the target has been converted to a NumPy matrix called predictors. The target, wage_per_hour, is available as a NumPy matrix called target.

For all exercises in this chapter, we’ve imported the Sequential model constructor, the Dense layer constructor, and pandas.

26
Q

0302

Compiling the model

A

You’re now going to compile the model you specified earlier. To compile the model, you need to specify the optimizer and loss function to use. In the video, Dan mentioned that the Adam optimizer is an excellent choice. You can read more about it as well as other keras optimizers here, and if you are really curious to learn more, you can read the original paper that introduced the Adam optimizer.

In this exercise, you’ll use the Adam optimizer and the mean squared error loss function. Go for it!

27
Q

0303 Fitting the model

A

You’re at the most fun part. You’ll now fit the model. Recall that the data to be used as predictive features is loaded in a NumPy matrix called predictors and the data to be predicted is stored in a NumPy matrix called target. Your model is pre-written and it has been compiled with the code from the previous exercise.

28
Q

03 Understanding your classification data
Now you will start modeling with a new dataset for a classification problem. This data includes information about passengers on the Titanic. You will use predictors such as age, fare and where each passenger embarked from to predict who will survive. This data is from a tutorial on data science competitions. Look here for descriptions of the features.

The data is pre-loaded in a pandas DataFrame called df.

It’s smart to review the maximum and minimum values of each variable to ensure the data isn’t misformatted or corrupted. What was the maximum age of passengers on the Titanic? Use the .describe() method in the IPython Shell to answer this question.

A

df.describe()

total number of entries in the data.
The maximum age in the data is 80.

29
Q

0304 Last steps in classification models

A

You’ll now create a classification model using the titanic dataset, which has been pre-loaded into a DataFrame called df. You’ll take information about the passengers and predict which ones survived.

The predictive variables are stored in a NumPy array predictors. The target to predict is in df.survived, though you’ll have to manipulate it for keras. The number of predictive features is stored in n_cols.

Here, you’ll use the ‘sgd’ optimizer, which stands for Stochastic Gradient Descent. You’ll learn more about this in the next chapter!

30
Q

Stochastic gradient descent

A

https://en.wikipedia.org/wiki/Stochastic_gradient_descent

31
Q

0305 Making predictions

A

The trained network from your previous coding exercise is now stored as model. New data to make predictions is stored in a NumPy array as pred_data. Use model to make predictions on your new data.

In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.

32
Q

04 Diagnosing optimization problems

Which of the following could prevent a model from showing an improved loss in its first few epochs?

A

Learning rate too low.
Learning rate too high.
Poor choice of activation function.

33
Q

0401 Changing optimization parameters

A

It’s time to get your hands dirty with optimization. You’ll now try optimizing a model at a very low learning rate, a very high learning rate, and a “just right” learning rate. You’ll want to look at the results after running this exercise, remembering that a low value for the loss function is good.

For these exercises, we’ve pre-loaded the predictors and target values from your previous classification models (predicting who would survive on the Titanic). You’ll want the optimization to start from scratch every time you change the learning rate, to give a fair comparison of how each learning rate did in your results. So we have created a function get_new_model() that creates an unoptimized model to optimize.

34
Q

0402 Evaluating model accuracy on validation dataset

A

Now it’s your turn to monitor model accuracy with a validation data set. A model definition has been provided as model. Your job is to add the code to compile it and then fit it. You’ll check the validation score in each epoch.

35
Q

adam

A

https://keras.io/optimizers/

36
Q

0403 Early stopping: Optimizing the optimization

A

Now that you know how to monitor your model performance throughout optimization, you can use early stopping to stop optimization when it isn’t helping any more. Since the optimization stops automatically when it isn’t helping, you can also set a high value for epochs in your call to .fit(), as Dan showed in the video.

The model you’ll optimize has been specified as model. As before, the data is pre-loaded as predictors and target.

optimization will automatically stop when it is no longer helpful, it is okay to specify the maximum number of epochs as 30 rather than using the default of 10 that you’ve used so far. Here, it seems like the optimization stopped after 7 epochs.

37
Q

0404 Experimenting with wider networks

A

Now you know everything you need to begin experimenting with different models!

A model called model_1 has been pre-loaded. You can see a summary of this model printed in the IPython Shell. This is a relatively small network, with only 10 units in each hidden layer.

In this exercise you’ll create a new model called model_2 which is similar to model_1, except it has 100 units in each hidden layer.

After you create model_2, both models will be fitted, and a graph showing both models loss score at each epoch will be shown. We added the argument verbose=False in the fitting commands to print out fewer updates, since you will look at these graphically instead of as text.

Because you are fitting two models, it will take a moment to see the outputs after you hit run, so be patient.

The blue model is the one you made, the red is the original model. Your model had a lower loss value, so it is the better model. Nice job!

38
Q

0405 Adding layers to a network

A

You’ve seen how to experiment with wider networks. In this exercise, you’ll try a deeper network (more hidden layers).

Once again, you have a baseline model called model_1 as a starting point. It has 1 hidden layer, with 50 units. You can see a summary of that model’s structure printed out. You will create a similar network with 3 hidden layers (still keeping 50 units in each layer).

This will again take a moment to fit both models, so you’ll need to wait a few seconds to see the results after you run your code.

39
Q

04 Experimenting with model structures

you compared two networks that were identical except that the 2nd network had an extra hidden layer.

You see that this 2nd network (the deeper network) had better performance.
Given that, which of the following would be a good experiment to run next for even better performance?

A

Increasing the number of units in each hidden layer would be a good next step to try achieving even better performance.

40
Q

dl aws

A

https://www.datacamp.com/community/tutorials/deep-learning-jupyter-aws

41
Q

todo

A

https: //www.datacamp.com/courses/advanced-deep-learning-with-keras-in-python
https: //www.datacamp.com/courses/convolutional-neural-networks-for-image-processing

https://www.datacamp.com/tracks/machine-learning-with-python

42
Q

0406 Building your own digit recognition model

A

You’ve reached the final exercise of the course - you now know everything you need to build an accurate model to recognize handwritten digits!

We’ve already done the basic manipulation of the MNIST dataset shown in the video, so you have X and y loaded and ready to model with. Sequential and Dense from keras are also pre-imported.

To add an extra challenge, we’ve loaded only 2500 images, rather than 60000 which you will see in some published results. Deep learning models perform better with more data, however, they also take longer to train, especially when they start becoming more complex.

If you have a computer with a CUDA compatible GPU, you can take advantage of it to improve computation time. If you don’t have a GPU, no problem! You can set up a deep learning environment in the cloud that can run your models on a GPU. Here is a blog post by Dan that explains how to do this - check it out after completing this exercise! It is a great next step as you continue your deep learning journey.

Ready to take your deep learning to the next level? Check out Advanced Deep Learning with Keras in Python to see how the Keras functional API lets you build domain knowledge to solve new types of problems. Once you know how to use the functional API, take a look at “Convolutional Neural Networks for Image Processing” to learn image-specific applications of Keras.