Deep Learning Revision Flashcards

Question 1

Q

Types of activation functions

Answer

A

Threshold Function
Sigmoid Function
Rectifierr Function (most popular)
Hyperbolic Tanget (tan h)

Question 2

Q

Why is Stochastic Gradient Descent better?

Answer

A

This is because stochastic gradient descent looks at the cost function of each row instead of all rows at once, hence avoiding the errors caused by local minimums.

Question 3

Q

Steps for training the ANN with Stochastic Gradient Descent

Answer

A

Randomly initialise the weights to small numbers close to 0.
Input the first observation of your dataset in the input layer, each feature in one input node.
FORWARD PROPAGATION: from left ot right; the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result ypred.
Compare the predicted result and the actual result. Measure the generated error.
BACK PROPAGATION: from right to left; Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
Repeat Step 1-5
(a) and update the weights after each observation
(Reinforcement Learning)
(b) and update the weights only after a batch of
observations (Batch Learning)
When the whole training set passed through the ANN< that makes ONE EPOCH. Redo more Epochs.

Question 4

Q

Steps for Data Preprocessing ANN

Answer

A

import tensorflow along with other libraries
deal with missing data.
encode any categorical data, if available.
Split into train and test sets.
Feature Scaling => Standardisation // COMPULSORY

Question 5

Q

Code to Build and train the ANN

Answer

A

# Initialise ANN as a sequence of layers
ann= = tf.keras.models.Sequential()

# Add Input layer and 1st hidden layer
ann.add(tf.keras.layers.Dense(
    units = 6, #number of neurons -> Need to experiment
    activation = 'relu'))

# Add the second hidden layer
ann.add(tf.keras.layers.Dense(
    units = 6, #number of neurons -> Need to experiment
    activation = 'relu'))

# Add the output layer
# Depending on dimensions of output
ann.add(tf.keras.layers.Dense(
    units = 1,  
    activation = 'sigmoid' # dimensions >= 2 -> 'softmax'))
#NO ACTIVATION FOR REGRESSION FOR OUTPUT LAYER

#Training the ANN
ann.compile(
    optimiser = 'adam', # uses stochastic gradient descent
     loss = 'binary_crossentropy', 
                       # vs categorical_crossentropy
    metrics = ['accuracy'] #can choose many)
#REGRESSION: loss = 'mean_loss_error', NO METRICS
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)

Question 6

Q

Steps for Convolutional Neural Networks (CNN)

Answer

A

Convolution: Making multiple feature maps by sharpening, blurring, edge enhance, edge detect and so on…
ReLU Layer: removes all negatives.
Max-Pooling: CNN looks for features => flexibility is key as the features can be various forms - close, distorted etc. This also helps reduce distortion and prevents over fitting, while preserving information.
Flattening => convert the 2D matrix to 1D to input in input layer.
Make the fully connected neural net.

Question 7

Q

Softmax vs Cross-Entropy

Answer

A

used to predict final probabilities

Softmax: converts all values in such a way that they all add to 1.

Cross-Entropy:
- Best for Classification

Question 8

Q

Cost function

Answer

A

For regression, Better option for a ‘cost function’ called ‘loss function’ (way better than mean squared error used before)

Question 9

Q

Data Preprocessing for CNN

Answer

A

#to avoid overfitting train_set:-
train_datagen = ImageDataGenerator(
     rescale = 1./255, #Feature scaling, divide each pixel by 255.
     sheer_range = 0.2,   #image augmentation, try other values
     zoom_range = 0.2, # try other values
     horizontal_flip = True)

training_set = train_datagen.flow_from_directory(
‘path/of/training/set’ #provide path
target_size = (64, 64), #resize for faster computation
batch_size = 32,
class_mode = ‘binary’) #depends on the problem, here it is binary classification

#for the test_set
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory(SAME AS TRAINING SET)

Question 10

Q

Code to Build and train the ANN

Answer

A

Training the CNN

# Initialise CNN as a sequence of layers
cnn = tf.keras.models.Sequential()

# Convolution
cnn.add(tf.keras.layers.Conv2D(
    filters = 32, #number of feature detectors
    kernel_size = 3,
    activation = 'relu',
    input_size = [64, 64, 3])) #only required in first layer, 
                                             # 64,64 is inspired by the target_size during importing.
                                             #[64,64,1] if images are B/W

# Pooling
cnn.add(tf.keras.layers.MaxPool2D(
    pool_size = 2, #takes 2x2 grids
    strides = 2))

#Adding another Convolutional Layer
cnn.add(tf.keras.layers.Conv2D(
    filters = 32, #number of feature detectors
    kernel_size = 3,
    activation = 'relu',))
cnn.add(tf.keras.layers.MaxPool2D(
    pool_size = 2, #takes 2x2 grids
    strides = 2))

# Flattening
cnn.add(tf.keras.layers.Flatten())

#Full Connection
cnn.add(tf.keras.layers.Dense(
    units = 128,
    activation = 'relu'))

# Add the output layer
# Depending on dimensions of output
cnn.add(tf.keras.layers.Dense(
    units = 1,  #Binary classification
    activation = 'sigmoid' #classes >= 2 -> 'softmax'))

cnn.compile(
optimiser = ‘adam’, # uses stochastic gradient descent
loss = ‘binary_crossentropy’,
# vs categorical_crossentropy
metrics = [‘accuracy’] #can choose many)

cnn.fit(
x = training_set,
validation_data = test_set,
epochs = 25)

Question 11

Q

The vanishing gradient problem and solution.

Answer

A

The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

Solution:

Weight initialisation
Echo State Networks
Long Short Term Memory Networks (LSTMs) BEST

Question 12

Q

the exploding gradient problem and solution

Answer

A

Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights

solution:

Truncated Back-propagation
Penalties
Gradient clipping

Question 13

Q

RNN Data Preprocessing steps

Answer

A

Feature Scaling
Create a data structure to use
Reshaping as
X_train = np.reshape[
X_train,
(X_train.shape[0], X_train.shape[1], 1)]
# batchsize, timesteps, indicators

Question 14

Q

RNN build and train

Answer

A

Add 1st LSTM layer and dropout Regularisation (prevent overfitting)

regressor.add(LSTM(
units = 50,
return_sequences = True,
input_shape = (X_train.shape[1], 1) #Shape from
reshaping the training axis from data preprocessing))
regressor.add(DropOut(0.2))

regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))
regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))

regressor. add(LSTM(units = 50, return_sequences = False))
regressor. add(Dropout(0.2))

#Add the output layer
regressor.add(Dense(units = 1))

#Compiling RNN
regressor.compile(
    optimiser = 'adam', 
    loss = 'mean_squared_error')

#Fitting RNN to training set
#try different epochs
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32

Deep Learning Revision Flashcards

(14 cards)