Lecture 9 Flashcards
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is particularly well-suited for image recognition and processing tasks. It is made up of multiple layers, including convolutional layers, pooling layers, and fully connected layers.
How do we process and recognize images?
For visual perception, our neuronal cells are in charge of different
orientation. For example, some will respond to vertical edges, some
horizontal, some diagonal, etc. These neuronal cells are organized in columnar architecture and function together to full the visual
perception tasks.
Key Insights from Mammalian Vision
- An image is not processed, perceived or understood in one huge lump
- The vision system considers small chunks of the visual field and
extracts key features from each - Features are combined at later stages of processing into something
recognizable as an object - This insight suggests that at the lowest level we can slide a small
“receptive window” over input data – convolution – to process small
chunks of input
What is Happening in Convolutional Layer?
Filters are composed of two parts:
* A set of weights
* An activation function
convolution
convolution is the summation of
the element-wise product of 2
matrices.
Sets of Layers in Typical Sequences
The convolution, non-linear, and pooling layers are typically used as a set. Multiple sets of the above three layers can appear in a CNN design.
Sets of Layers in Typical Sequences
Input -> Conv. -> Non-linear -> Pooling -> Conv. -> Non-linear -> Pooling -> …->
Output
Sets of Layers in Typical Sequences
After a few sets, the output is typically sent to one or two fully
connected (dense) hidden layers.
* A fully connected layer is an ordinary neural network layer as in other neural networks.
* Typical activation function is the sigmoid function.
* Output is typically class (classification) or real number (regression).
Keras/TensorFlow in Python
Many different software platforms support neural network analysis, generally, and CNNs particularly. Python was used to build some of the earliest tools, but as an interpreted language
Python is far too slow to actually fit neural models at scale. Instead, we use a “front end” – “back
end” arrangement to take advantage of the efficiency of languages like C++ and CUDA (a GPU language). Here, we are using the Keras package as the “front end” for setting up our model and
data, and then Keras passes this to the TensorFlow backend to do the actual model fitting.
Two Keras Model Types
Sequential
(Functional) Model
Sequential
- Simplest approach and used in the majority of examples
- Allows for one “input tensor” and
one “output tensor” - Each successive layer of the model is “stacked” on the previous layer
- The layers are connected in order
of how they are invoked and the
connections between layers are
made automatically
(Functional) Model
- More complex and flexible
approach – addresses difficult
“non-standard” computing
problems - Allows for more than one “input
tensor” and more than one
“output tensor” - The output of a layer can be
connected to more than one
subsequent layer (think of this like
parallel branches)
What is Tensor?
- Tensor is a dimensional data structure
A first-rank tensor can be a vector
A second-rank tensor can be a matrix
Is a matrix = second-rank tensor?
“all squares are rectangles, but not all rectangles are squares”
Tensors obey specific transformation rules as part of the structure they have
but matices do not necessarily have this.
Many Types of Layers Supported
- Each layer has a particular
architectural configuration meant
to accomplish a particular kind of
task - For example, we know that pooling layers do data reduction while highlighting strong features
- Each layer has options for size,
initialization, and activation
function
Many Types of Layers Supported
- Partial list:
- Preprocessing layers (e.g., text)
- Core layers (basic types, e.g., “Dense”)
- Convolution layers (1D, 2D, and 3D)
- Pooling layers (1D, 2D, and 3D; max or
average) - Recurrent layers (e.g., LSTM)
- Normalization and regularization layers
- Attention layers (multi-head)
- Reshaping/merging
- Activation layers
Activation Function Reminder
- The ”secret sauce” of neural
networks is non-linear activation
functions - Linear functions model linear
phenomena; anything more
complex and we get predictions
that only work in a narrow range - After the inputs to a neural node
are summed, the activation
function produces an output value
(Y) based on the sum of the input
values (X) according to curves like
the ones at the right
Loss Function, Optimizer, Metrics
- A loss function (AKA “cost” or “error” function) is an expression
that produces a value for “how wrong we are” with a set of
predictions - There are two big groups of loss functions, one for classification
tasks (probabilistic losses) and one for metric prediction tasks
(regression losses) - The most well known (and widely used) regression loss is “mean
squared error” – the mean of the squared differences between
predicted and actual y values
Loss Function, Optimizer, Metrics
- Optimizers (in Keras) control
the practicalities of how
model fitting pursues the loss
function - “stochastic gradient descent”
– imagine a skier making small
random turns to go downhill
as quickly as possible - AdaDelta optimizer can adjust
that learning rate dynamically
to make model fitting more
efficient
Embedding Layer -
Tweet Matrix
- Each tweet ti consists of a sequence of tokens w1,w2,…wni . L1 is the maximum tweet length. Short tweets are padded using zero padding.
- Every word is represented as a d-dimensional word vector
- The publicly available pre-trained GloVe word vectors for Twitter
by (Pennington et al., 2014).
Embedding Layer -
Hash-Emo Matrix
- Hashtags, emoticons and emojis
- for each tweet ti, we extract hashtags h1, h2, … and emoticons/emojis e1, e2, e… and concatenate the hashtags and emoticon/emoji vectors
- L2: the height of the Hash-Emo Matrix. Tweets with the number of
hash-emo features less than L2 are padded with zero while tweets
with more hash-emo features than L2 are truncated. - d-dimension Word vectors from GloVe
- Random initiation
- no word vector is found for a particular word and emoticons.
- for emojis, we first map it to something descriptive; and then generate random word vectors
Convolutional Layer
- Apply m filters of varying window sizes over the Tweet Matrix from
the embedding layer - window size (k) refers to the number of adjacent word vectors in the Tweet Matrix that are filtered together (when k > 1)
Dropout and Max Pooling Layer
- ReLU is applied before dropout layer
- Dropout is used as a regularization strategy to avoid overfitting
- Max-pooling: the maximum value for each filter
Dropout and Max Pooling Layer
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 -> 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9
The filter moves through the layer in a 3x3 matrix extracting the highest value.
Fully Connected Layer
- Maps the inputs to a number of outputs corresponding to the
number of classes we have. - Emotion recognition: a multi-class classification task
- Softmax as the activation function and categorical cross-entropy as the loss function
- The output of the softmax function is equivalent to a categorical probability
distribution which generally indicates the probability that any of the classes are true