Assignment 4: CNN's & 5: behavioural data science Flashcards
When are FCNN better than CNNs?
CNNs are especially appropriate when there is some “structure” in the input data. Examples of such structured input data are images (spatial structure) or speech (temporal structure). Better for feature detection!! FCNN is better to process complex, particular data so every part will get fully analyzed (abstract data)
What is the difference between 2D images and 3D images in regards to these networks?
2D images are grayscale images and 3D images are color images (RGB images).
What is meant by convolution?
A key concept in CNNs. It refers to the multiplication and summing operation in FCNNs.
Convolution in CNNs involves multiplication & summing between the units’ activation and their corresponding weight. Like FCNNs, the output performed during convolution represents the elements of the next layer. Importantly, the shape of data is largely retained across layers in CNNs. So, if the input layer consists of rectangular images, the subsequent layers will also consist of rectangular grids of units. In FCNNs, each unit in the current layer has a weight for every unit in the next layer. For most modern convolutional neural networks this is not the case.
What is the notion of weight sharing in CNNs?
Convolution introduces “weight sharing”. Instead of having separate weights for each input unit (like in FCNN), it uses a small set of weights, often called a “kernel”, that is used across the entire input! Usually, these kernels are small 2D rectangular grids of values. The “nine” values of the kernel represent that kernel’s weights. One way to interpret kernels is as “feature detectors”.
Importantly, in case of multiple channels (such as with color images in the input layer), there is a separate kernel for each channel. So, for color images in the input layer, you need three different 2D kernels.
What is meant by the dot product?
At every location in the layer, each element of the kernel is multiplied with the corresponding element in the layer which are subsequently summed (i.e., the dot product = summing and multiplication). In other words, the dot product is repeated for every location in the layer, which represents the essence of convolution. Each time the dot product is performed, its output (i.e., a single value) is stored. In the interpretation of kernels as feature detectors, you can think of this sliding dot product as the process in which the feature detection process is repeated at all locations in the layer!
How many times is this conmvolution operation repeated?
This convolution operation is repeated for each kernel and its corresponding channel. For example, in the first convolutional layer, convolution is performed three times (with three different kernels): once for each color channel in the input layer. This results in three intermediate maps, which are subsequently summed together elementwise (to which finally the bias term is added) to produce the next layer of units! The resulting 2D map from this process is sometimes called an “activation map” or “feature map”.
What is a filter bank and why are they useful?
convolutional layers in CNNs often contain multiple 3D filters! A set of 3D filters is sometimes called a “filter bank”. By including multiple 3D filters, the network may extract different features from the previous layer to be processed and combined in the following layers. As each 3D filter results in only a single 2D activation map, in case multiple 3D filters are used, the 2D output of each 3D filter is stacked such that the resulting layer becomes not a 2D activation map, but a 3D activation tensor! In other words, each 3D filter used creates a new channel in the next layer! For example, if the first convolutional layer in a network uses six 3D filters, the resulting layer will have six channels (i.e., it will be a tensor with dimensions: width x height x 6). The use of “channel” here is the same as the three channels in the input layer when dealing with color images. Technically, “channel” is just used to refer to the third dimension of tensors in the context of CNNs.
What is a 3D filter?
The stack of 2D kernels for a multi-channel image is often referred to as a (3D) “filter”.
Why do we use activation functions?
Like in FCNNs, activation functions are always included in CNNs to add non-linearity to the network, often right after the convolutional layer.
Consider the following vector of activations, x, from a particular activation map: [-0.05, 0.42, 0.57, -0.35, -0.81, 0.00, 0.63]. After passing each value of this vector through a ReLU activation function, what does the resulting vector look like? Make sure to answer with two decimal precision.
0.00; 0.42; 0.57; 0.00; 0.00; 0.63»_space; ReLU activation function makes all values positive.
What is pooling and what is its use?
Pooling is an operation that is sometimes added after a convolutional layer (and the corresponding activation function) for the purpose to reduce the size (width + height) of the layer, which not only reduces the computational cost of training the network, but also allows the network to focus on more coarse-grained and global features. Pooling usually computes the mean (mean pooling) or more commonly computes the maximum (max pooling).
How do CNN’s go from the penultimate (second last) layer to the output layer?
First the penultimate layer is flattened, which refers to the process of transforming a multidimensional tensor or matrix to a 1D vector. Second step is a fully-connected wiring between each unit in the flattened layer and the units in the output layer (so the penultimate (flattened) layer is an FCNN). The third step is the softmax function.
- Penultimate layer is flattened
- A fully connected wiring between each unit in the flattened layer and the units in the output layer
- Softmax function
What are flattened layers in CNNs equivalent to in FCNNs?
Flattened layers in CNNs are equivalent to regular (hidden) layers in FCNNs.
What does the softmax function do?
This function normalizes the activity of each output unit to be between 0 and 1 and makes sure that the sum of the normalized activity of all output units equals 1. This way, the (softmax-normalized) activity in each output unit can be interpreted as the probability for the corresponding class. As such, most CNNs make probabilistic predictions.
Importantly, while the softmax function is applied to the activity of each output unit separately, it uses the (not-yet-softmax-normalized) activity of all the other output units as well! As opposed to the activation function which does not use information from other functions.
What do psychologists bring to the field of behavioral data science?
Understanding human behavior (expertise in designing experiments, measuring, and testing human behavior) and psychological methods (statistics and mathematical models of behavior, the basis of data science methods).