Deep Learning Revision Flashcards
Types of activation functions
- Threshold Function
- Sigmoid Function
- Rectifierr Function (most popular)
- Hyperbolic Tanget (tan h)
Why is Stochastic Gradient Descent better?
This is because stochastic gradient descent looks at the cost function of each row instead of all rows at once, hence avoiding the errors caused by local minimums.
Steps for training the ANN with Stochastic Gradient Descent
- Randomly initialise the weights to small numbers close to 0.
- Input the first observation of your dataset in the input layer, each feature in one input node.
- FORWARD PROPAGATION: from left ot right; the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result ypred.
- Compare the predicted result and the actual result. Measure the generated error.
- BACK PROPAGATION: from right to left; Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
- Repeat Step 1-5
(a) and update the weights after each observation
(Reinforcement Learning)
(b) and update the weights only after a batch of
observations (Batch Learning) - When the whole training set passed through the ANN< that makes ONE EPOCH. Redo more Epochs.
Steps for Data Preprocessing ANN
- import tensorflow along with other libraries
- deal with missing data.
- encode any categorical data, if available.
- Split into train and test sets.
- Feature Scaling => Standardisation // COMPULSORY
Code to Build and train the ANN
# Initialise ANN as a sequence of layers ann= = tf.keras.models.Sequential()
# Add Input layer and 1st hidden layer ann.add(tf.keras.layers.Dense( units = 6, #number of neurons -> Need to experiment activation = 'relu'))
# Add the second hidden layer ann.add(tf.keras.layers.Dense( units = 6, #number of neurons -> Need to experiment activation = 'relu'))
# Add the output layer # Depending on dimensions of output ann.add(tf.keras.layers.Dense( units = 1, activation = 'sigmoid' # dimensions >= 2 -> 'softmax')) #NO ACTIVATION FOR REGRESSION FOR OUTPUT LAYER
#Training the ANN ann.compile( optimiser = 'adam', # uses stochastic gradient descent loss = 'binary_crossentropy', # vs categorical_crossentropy metrics = ['accuracy'] #can choose many) #REGRESSION: loss = 'mean_loss_error', NO METRICS ann.fit(X_train, y_train, batch_size = 32, epochs = 100)
Steps for Convolutional Neural Networks (CNN)
- Convolution: Making multiple feature maps by sharpening, blurring, edge enhance, edge detect and so on…
- ReLU Layer: removes all negatives.
- Max-Pooling: CNN looks for features => flexibility is key as the features can be various forms - close, distorted etc. This also helps reduce distortion and prevents over fitting, while preserving information.
- Flattening => convert the 2D matrix to 1D to input in input layer.
- Make the fully connected neural net.
Softmax vs Cross-Entropy
- used to predict final probabilities
Softmax: converts all values in such a way that they all add to 1.
Cross-Entropy:
- Best for Classification
Cost function
For regression, Better option for a ‘cost function’ called ‘loss function’ (way better than mean squared error used before)
Data Preprocessing for CNN
#to avoid overfitting train_set:- train_datagen = ImageDataGenerator( rescale = 1./255, #Feature scaling, divide each pixel by 255. sheer_range = 0.2, #image augmentation, try other values zoom_range = 0.2, # try other values horizontal_flip = True)
training_set = train_datagen.flow_from_directory(
‘path/of/training/set’ #provide path
target_size = (64, 64), #resize for faster computation
batch_size = 32,
class_mode = ‘binary’) #depends on the problem, here it is binary classification
#for the test_set test_datagen = ImageDataGenerator(rescale = 1./255) test_set = test_datagen.flow_from_directory(SAME AS TRAINING SET)
Code to Build and train the ANN
Training the CNN
# Initialise CNN as a sequence of layers cnn = tf.keras.models.Sequential()
# Convolution cnn.add(tf.keras.layers.Conv2D( filters = 32, #number of feature detectors kernel_size = 3, activation = 'relu', input_size = [64, 64, 3])) #only required in first layer, # 64,64 is inspired by the target_size during importing. #[64,64,1] if images are B/W
# Pooling cnn.add(tf.keras.layers.MaxPool2D( pool_size = 2, #takes 2x2 grids strides = 2))
#Adding another Convolutional Layer cnn.add(tf.keras.layers.Conv2D( filters = 32, #number of feature detectors kernel_size = 3, activation = 'relu',)) cnn.add(tf.keras.layers.MaxPool2D( pool_size = 2, #takes 2x2 grids strides = 2))
# Flattening cnn.add(tf.keras.layers.Flatten())
#Full Connection cnn.add(tf.keras.layers.Dense( units = 128, activation = 'relu'))
# Add the output layer # Depending on dimensions of output cnn.add(tf.keras.layers.Dense( units = 1, #Binary classification activation = 'sigmoid' #classes >= 2 -> 'softmax'))
cnn.compile(
optimiser = ‘adam’, # uses stochastic gradient descent
loss = ‘binary_crossentropy’,
# vs categorical_crossentropy
metrics = [‘accuracy’] #can choose many)
cnn.fit(
x = training_set,
validation_data = test_set,
epochs = 25)
The vanishing gradient problem and solution.
The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.
Solution:
- Weight initialisation
- Echo State Networks
- Long Short Term Memory Networks (LSTMs) BEST
the exploding gradient problem and solution
Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights
solution:
- Truncated Back-propagation
- Penalties
- Gradient clipping
RNN Data Preprocessing steps
- Feature Scaling
- Create a data structure to use
- Reshaping as
X_train = np.reshape[
X_train,
(X_train.shape[0], X_train.shape[1], 1)]
# batchsize, timesteps, indicators
RNN build and train
Add 1st LSTM layer and dropout Regularisation (prevent overfitting)
regressor.add(LSTM(
units = 50,
return_sequences = True,
input_shape = (X_train.shape[1], 1) #Shape from
reshaping the training axis from data preprocessing))
regressor.add(DropOut(0.2))
regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))
regressor. add(LSTM(units = 50, return_sequences = True))
regressor. add(Dropout(0.2))
regressor. add(LSTM(units = 50, return_sequences = False))
regressor. add(Dropout(0.2))
#Add the output layer regressor.add(Dense(units = 1))
#Compiling RNN regressor.compile( optimiser = 'adam', loss = 'mean_squared_error')
#Fitting RNN to training set #try different epochs regressor.fit(X_train, y_train, epochs = 100, batch_size = 32