Lecture 9 Flashcards
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is particularly well-suited for image recognition and processing tasks. It is made up of multiple layers, including convolutional layers, pooling layers, and fully connected layers.
How do we process and recognize images?
For visual perception, our neuronal cells are in charge of different
orientation. For example, some will respond to vertical edges, some
horizontal, some diagonal, etc. These neuronal cells are organized in columnar architecture and function together to full the visual
perception tasks.
Key Insights from Mammalian Vision
- An image is not processed, perceived or understood in one huge lump
- The vision system considers small chunks of the visual field and
extracts key features from each - Features are combined at later stages of processing into something
recognizable as an object - This insight suggests that at the lowest level we can slide a small
“receptive window” over input data – convolution – to process small
chunks of input
What is Happening in Convolutional Layer?
Filters are composed of two parts:
* A set of weights
* An activation function
convolution
convolution is the summation of
the element-wise product of 2
matrices.
Sets of Layers in Typical Sequences
The convolution, non-linear, and pooling layers are typically used as a set. Multiple sets of the above three layers can appear in a CNN design.
Sets of Layers in Typical Sequences
Input -> Conv. -> Non-linear -> Pooling -> Conv. -> Non-linear -> Pooling -> …->
Output
Sets of Layers in Typical Sequences
After a few sets, the output is typically sent to one or two fully
connected (dense) hidden layers.
* A fully connected layer is an ordinary neural network layer as in other neural networks.
* Typical activation function is the sigmoid function.
* Output is typically class (classification) or real number (regression).
Keras/TensorFlow in Python
Many different software platforms support neural network analysis, generally, and CNNs particularly. Python was used to build some of the earliest tools, but as an interpreted language
Python is far too slow to actually fit neural models at scale. Instead, we use a “front end” – “back
end” arrangement to take advantage of the efficiency of languages like C++ and CUDA (a GPU language). Here, we are using the Keras package as the “front end” for setting up our model and
data, and then Keras passes this to the TensorFlow backend to do the actual model fitting.
Two Keras Model Types
Sequential
(Functional) Model
Sequential
- Simplest approach and used in the majority of examples
- Allows for one “input tensor” and
one “output tensor” - Each successive layer of the model is “stacked” on the previous layer
- The layers are connected in order
of how they are invoked and the
connections between layers are
made automatically
(Functional) Model
- More complex and flexible
approach – addresses difficult
“non-standard” computing
problems - Allows for more than one “input
tensor” and more than one
“output tensor” - The output of a layer can be
connected to more than one
subsequent layer (think of this like
parallel branches)
What is Tensor?
- Tensor is a dimensional data structure
A first-rank tensor can be a vector
A second-rank tensor can be a matrix
Is a matrix = second-rank tensor?
“all squares are rectangles, but not all rectangles are squares”
Tensors obey specific transformation rules as part of the structure they have
but matices do not necessarily have this.
Many Types of Layers Supported
- Each layer has a particular
architectural configuration meant
to accomplish a particular kind of
task - For example, we know that pooling layers do data reduction while highlighting strong features
- Each layer has options for size,
initialization, and activation
function
Many Types of Layers Supported
- Partial list:
- Preprocessing layers (e.g., text)
- Core layers (basic types, e.g., “Dense”)
- Convolution layers (1D, 2D, and 3D)
- Pooling layers (1D, 2D, and 3D; max or
average) - Recurrent layers (e.g., LSTM)
- Normalization and regularization layers
- Attention layers (multi-head)
- Reshaping/merging
- Activation layers