Deep Learning Revision Flashcards

1
Q

Types of activation functions

A
  1. Threshold Function
  2. Sigmoid Function
  3. Rectifierr Function (most popular)
  4. Hyperbolic Tanget (tan h)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is Stochastic Gradient Descent better?

A

This is because stochastic gradient descent looks at the cost function of each row instead of all rows at once, hence avoiding the errors caused by local minimums.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Steps for training the ANN with Stochastic Gradient Descent

A
  1. Randomly initialise the weights to small numbers close to 0.
  2. Input the first observation of your dataset in the input layer, each feature in one input node.
  3. FORWARD PROPAGATION: from left ot right; the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result ypred.
  4. Compare the predicted result and the actual result. Measure the generated error.
  5. BACK PROPAGATION: from right to left; Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
  6. Repeat Step 1-5
    (a) and update the weights after each observation
    (Reinforcement Learning)
    (b) and update the weights only after a batch of
    observations (Batch Learning)
  7. When the whole training set passed through the ANN< that makes ONE EPOCH. Redo more Epochs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Steps for Data Preprocessing ANN

A
  1. import tensorflow along with other libraries
  2. deal with missing data.
  3. encode any categorical data, if available.
  4. Split into train and test sets.
  5. Feature Scaling => Standardisation // COMPULSORY
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Code to Build and train the ANN

A
# Initialise ANN as a sequence of layers
ann= = tf.keras.models.Sequential()
# Add Input layer and 1st hidden layer
ann.add(tf.keras.layers.Dense(
    units = 6, #number of neurons -> Need to experiment
    activation = 'relu'))
# Add the second hidden layer
ann.add(tf.keras.layers.Dense(
    units = 6, #number of neurons -> Need to experiment
    activation = 'relu'))
# Add the output layer
# Depending on dimensions of output
ann.add(tf.keras.layers.Dense(
    units = 1,  
    activation = 'sigmoid' # dimensions >= 2 -> 'softmax'))
#NO ACTIVATION FOR REGRESSION FOR OUTPUT LAYER
#Training the ANN
ann.compile(
    optimiser = 'adam', # uses stochastic gradient descent
     loss = 'binary_crossentropy', 
                       # vs categorical_crossentropy
    metrics = ['accuracy'] #can choose many)
#REGRESSION: loss = 'mean_loss_error', NO METRICS
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps for Convolutional Neural Networks (CNN)

A
  1. Convolution: Making multiple feature maps by sharpening, blurring, edge enhance, edge detect and so on…
  2. ReLU Layer: removes all negatives.
  3. Max-Pooling: CNN looks for features => flexibility is key as the features can be various forms - close, distorted etc. This also helps reduce distortion and prevents over fitting, while preserving information.
  4. Flattening => convert the 2D matrix to 1D to input in input layer.
  5. Make the fully connected neural net.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Softmax vs Cross-Entropy

A
  • used to predict final probabilities

Softmax: converts all values in such a way that they all add to 1.

Cross-Entropy:
- Best for Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cost function

A

For regression, Better option for a ‘cost function’ called ‘loss function’ (way better than mean squared error used before)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Preprocessing for CNN

A
#to avoid overfitting train_set:-
train_datagen = ImageDataGenerator(
     rescale = 1./255, #Feature scaling, divide each pixel by 255.
     sheer_range = 0.2,   #image augmentation, try other values
     zoom_range = 0.2, # try other values
     horizontal_flip = True)

training_set = train_datagen.flow_from_directory(
‘path/of/training/set’ #provide path
target_size = (64, 64), #resize for faster computation
batch_size = 32,
class_mode = ‘binary’) #depends on the problem, here it is binary classification

#for the test_set
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory(SAME AS TRAINING SET)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Code to Build and train the ANN

A

Training the CNN

# Initialise CNN as a sequence of layers
cnn = tf.keras.models.Sequential()
# Convolution
cnn.add(tf.keras.layers.Conv2D(
    filters = 32, #number of feature detectors
    kernel_size = 3,
    activation = 'relu',
    input_size = [64, 64, 3])) #only required in first layer, 
                                             # 64,64 is inspired by the target_size during importing.
                                             #[64,64,1] if images are B/W
# Pooling
cnn.add(tf.keras.layers.MaxPool2D(
    pool_size = 2, #takes 2x2 grids
    strides = 2))
#Adding another Convolutional Layer
cnn.add(tf.keras.layers.Conv2D(
    filters = 32, #number of feature detectors
    kernel_size = 3,
    activation = 'relu',))
cnn.add(tf.keras.layers.MaxPool2D(
    pool_size = 2, #takes 2x2 grids
    strides = 2))
# Flattening
cnn.add(tf.keras.layers.Flatten())
#Full Connection
cnn.add(tf.keras.layers.Dense(
    units = 128,
    activation = 'relu'))
# Add the output layer
# Depending on dimensions of output
cnn.add(tf.keras.layers.Dense(
    units = 1,  #Binary classification
    activation = 'sigmoid' #classes >= 2 -> 'softmax'))

cnn.compile(
optimiser = ‘adam’, # uses stochastic gradient descent
loss = ‘binary_crossentropy’,
# vs categorical_crossentropy
metrics = [‘accuracy’] #can choose many)

cnn.fit(
x = training_set,
validation_data = test_set,
epochs = 25)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The vanishing gradient problem and solution.

A

The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

Solution:

  1. Weight initialisation
  2. Echo State Networks
  3. Long Short Term Memory Networks (LSTMs) BEST
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the exploding gradient problem and solution

A

Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights

solution:

  1. Truncated Back-propagation
  2. Penalties
  3. Gradient clipping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

RNN Data Preprocessing steps

A
  1. Feature Scaling
  2. Create a data structure to use
  3. Reshaping as
    X_train = np.reshape[
    X_train,
    (X_train.shape[0], X_train.shape[1], 1)]
    # batchsize, timesteps, indicators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RNN build and train

A

Add 1st LSTM layer and dropout Regularisation (prevent overfitting)

regressor.add(LSTM(
units = 50,
return_sequences = True,
input_shape = (X_train.shape[1], 1) #Shape from
reshaping the training axis from data preprocessing))
regressor.add(DropOut(0.2))

regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))
regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))

regressor. add(LSTM(units = 50, return_sequences = False))
regressor. add(Dropout(0.2))

#Add the output layer
regressor.add(Dense(units = 1))
#Compiling RNN
regressor.compile(
    optimiser = 'adam', 
    loss = 'mean_squared_error')
#Fitting RNN to training set
#try different epochs
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32
How well did you know this?
1
Not at all
2
3
4
5
Perfectly