Advanced Learning Algorithms (From Practice Quizzes) Flashcards
Which of these are terms used to refer to components of an artificial neural network? (hint: three of these are correct)
A.) layers
B.) neurons
C.) activation function
D.) axon
A.) B.) C.)
True/False? Neural networks take inspiration from, but do not very accurately mimic, how neurons in a biological brain learn.
True; Artificial neural networks use a very simplified mathematical model of what a biological neuron does.
Question 1
For the following code:
model = Sequential([
Dense(units=25, activation=”sigmoid”),
Dense(units=15, activation=”sigmoid”),
Dense(units=10, activation=”sigmoid”),
Dense(units=1, activation=”sigmoid”)])
This code will define a neural network with how many layers?
A.) 4
B.) 5
C.) 3
D.) 25
A.) 4
Using TensorFlow, how do you define the second neural network layer with 4 neurons and a sigmoid activation?
A.) Dense(layer=2, units=4, activation = ‘sigmoid’)
B.) Dense(units=4, activation=‘sigmoid’)
C.) Dense(units=4)
D.) Dense(units=[4], activation=[‘sigmoid’])
B.)
Which of the following activation functions is the most common choice for the hidden layers of a neural network?
A.) Sigmoid
B.) Linear
C.) ReLU
D.) Most hidden layers do not use any activation function
C.) A ReLU is most often used because it is faster to train compared to the sigmoid. This is because the ReLU is only flat on one side (the left side) whereas the sigmoid goes flat (horizontal, slope approaching zero) on both sides of the curve.
For the task of predicting housing prices, which activation functions could you choose for the output layer? Choose the 2 options that apply.
A.) Linear
B.) Sigmoid
C.) ReLU
A.) and C.). A linear activation function can be used for a regression task where the output can be both negative and positive, but it’s also possible to use it for a task where the output is 0 or greater (like with house prices). ReLU outputs values 0 or greater, and housing prices are positive values.
True/False? A neural network with many layers but no activation function (in the hidden layers) is not effective; that’s why we should instead use the linear activation function in every hidden layer.
False; A neural network with many layers but no activation function is not effective. A linear activation is the same as “no activation function”
For a multiclass classification task that has 4 possible outputs, the sum of all the activations adds up to 1. For a multiclass classification task that has 3 possible outputs, the sum of all the activations should add up to ….
A.) Less than 1
B.) 1
C.) It will vary, depending on the input x
D.) More than 1
B.) 1
For multiclass classification, the cross entropy loss is used for training the model. If there are 4 possible classes for the output, and for a particular training example, the true class of the example is class 3 (y=3), then what does the cross entropy loss simplify to? [Hint: This loss should get smaller when a3 gets larger.]
A.) z_3/(z_1+z_2+z_3+z_4)
B.) z_3
C.) −log(a3)
C.)
For multiclass classification, the recommended way to implement softmax regression is to set from_logits=True in the loss function, and also to define the model’s output layer with…
A.) a ‘softmax’ activation
B.) a ‘linear’ activation
B.) Set the output as linear, because the loss function handles the calculation of the softmax with a more numerically stable method.
The Adam optimizer is the recommended optimizer for finding the optimal parameters of the model. How do you use the Adam optimizer in TensorFlow?
A.) The call to model.compile() will automatically pick the best optimizer, whether it is gradient descent, Adam or something else. So there’s no need to pick an optimizer manually.
B.) The call to model.compile() uses the Adam optimizer by default
C.) The Adam optimizer works only with Softmax outputs. So if a neural network has a Softmax output layer, TensorFlow will automatically pick the Adam optimizer.
D.) When calling model.compile, set optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3).
D.)
What is this name of the layer type where each single neuron of the layer does not look at all the values of the input vector that is fed into that layer.
A.) convolutional layer
B.) A fully connected layer
C.) Image layer
D.) 1D layer or 2D layer (depending on the input dimension)
A.) For a convolutional layer, each neuron takes as input a subset of the vector that is fed into that layer.
In the context of machine learning, what is a diagnostic?
A.) This refers to the process of measuring how well a learning algorithm does on a test set (data that the algorithm was not trained on).
B.) A test that you run to gain insight into what is/isn’t working with a learning algorithm.
C.) An application of machine learning to medical applications, with the goal of diagnosing patients’ conditions.
D.) A process by which we quickly try as many different ways to improve an algorithm as possible, so as to see what works.
B.) A diagnostic is a test that you run to gain insight into what is/isn’t working with a learning algorithm, to gain guidance into improving its performance
True/False? It is always true that the better an algorithm does on the training set, the better it will do on generalizing to new data.
False; if a model overfits the training set, it may not generalize well to new data.
For a classification task; suppose you train three different models using three different neural network architectures. Which data do you use to evaluate the three models in order to choose the best one?
A.) The test set
B.) The cross validation set
C.) The training set
D.) All the data – training, cross validation and test sets put together.
B.) Use the cross-validation set to calculate the cross-validation error on all three models in order to compare which of the three models is best.