Assignment 4: CNN's & 5: behavioural data science Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

When are FCNN better than CNNs?

A

CNNs are especially appropriate when there is some “structure” in the input data. Examples of such structured input data are images (spatial structure) or speech (temporal structure). Better for feature detection!! FCNN is better to process complex, particular data so every part will get fully analyzed (abstract data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between 2D images and 3D images in regards to these networks?

A

2D images are grayscale images and 3D images are color images (RGB images).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by convolution?

A

A key concept in CNNs. It refers to the multiplication and summing operation in FCNNs.
Convolution in CNNs involves multiplication & summing between the units’ activation and their corresponding weight. Like FCNNs, the output performed during convolution represents the elements of the next layer. Importantly, the shape of data is largely retained across layers in CNNs. So, if the input layer consists of rectangular images, the subsequent layers will also consist of rectangular grids of units. In FCNNs, each unit in the current layer has a weight for every unit in the next layer. For most modern convolutional neural networks this is not the case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the notion of weight sharing in CNNs?

A

Convolution introduces “weight sharing”. Instead of having separate weights for each input unit (like in FCNN), it uses a small set of weights, often called a “kernel”, that is used across the entire input! Usually, these kernels are small 2D rectangular grids of values. The “nine” values of the kernel represent that kernel’s weights. One way to interpret kernels is as “feature detectors”.
Importantly, in case of multiple channels (such as with color images in the input layer), there is a separate kernel for each channel. So, for color images in the input layer, you need three different 2D kernels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by the dot product?

A

At every location in the layer, each element of the kernel is multiplied with the corresponding element in the layer which are subsequently summed (i.e., the dot product = summing and multiplication). In other words, the dot product is repeated for every location in the layer, which represents the essence of convolution. Each time the dot product is performed, its output (i.e., a single value) is stored. In the interpretation of kernels as feature detectors, you can think of this sliding dot product as the process in which the feature detection process is repeated at all locations in the layer!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many times is this conmvolution operation repeated?

A

This convolution operation is repeated for each kernel and its corresponding channel. For example, in the first convolutional layer, convolution is performed three times (with three different kernels): once for each color channel in the input layer. This results in three intermediate maps, which are subsequently summed together elementwise (to which finally the bias term is added) to produce the next layer of units! The resulting 2D map from this process is sometimes called an “activation map” or “feature map”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a filter bank and why are they useful?

A

convolutional layers in CNNs often contain multiple 3D filters! A set of 3D filters is sometimes called a “filter bank”. By including multiple 3D filters, the network may extract different features from the previous layer to be processed and combined in the following layers. As each 3D filter results in only a single 2D activation map, in case multiple 3D filters are used, the 2D output of each 3D filter is stacked such that the resulting layer becomes not a 2D activation map, but a 3D activation tensor! In other words, each 3D filter used creates a new channel in the next layer! For example, if the first convolutional layer in a network uses six 3D filters, the resulting layer will have six channels (i.e., it will be a tensor with dimensions: width x height x 6). The use of “channel” here is the same as the three channels in the input layer when dealing with color images. Technically, “channel” is just used to refer to the third dimension of tensors in the context of CNNs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a 3D filter?

A

The stack of 2D kernels for a multi-channel image is often referred to as a (3D) “filter”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we use activation functions?

A

Like in FCNNs, activation functions are always included in CNNs to add non-linearity to the network, often right after the convolutional layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consider the following vector of activations, x, from a particular activation map: [-0.05, 0.42, 0.57, -0.35, -0.81, 0.00, 0.63]. After passing each value of this vector through a ReLU activation function, what does the resulting vector look like? Make sure to answer with two decimal precision.

A

0.00; 0.42; 0.57; 0.00; 0.00; 0.63&raquo_space; ReLU activation function makes all values positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is pooling and what is its use?

A

Pooling is an operation that is sometimes added after a convolutional layer (and the corresponding activation function) for the purpose to reduce the size (width + height) of the layer, which not only reduces the computational cost of training the network, but also allows the network to focus on more coarse-grained and global features. Pooling usually computes the mean (mean pooling) or more commonly computes the maximum (max pooling).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do CNN’s go from the penultimate (second last) layer to the output layer?

A

First the penultimate layer is flattened, which refers to the process of transforming a multidimensional tensor or matrix to a 1D vector. Second step is a fully-connected wiring between each unit in the flattened layer and the units in the output layer (so the penultimate (flattened) layer is an FCNN). The third step is the softmax function.

  1. Penultimate layer is flattened
  2. A fully connected wiring between each unit in the flattened layer and the units in the output layer
  3. Softmax function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are flattened layers in CNNs equivalent to in FCNNs?

A

Flattened layers in CNNs are equivalent to regular (hidden) layers in FCNNs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the softmax function do?

A

This function normalizes the activity of each output unit to be between 0 and 1 and makes sure that the sum of the normalized activity of all output units equals 1. This way, the (softmax-normalized) activity in each output unit can be interpreted as the probability for the corresponding class. As such, most CNNs make probabilistic predictions.
Importantly, while the softmax function is applied to the activity of each output unit separately, it uses the (not-yet-softmax-normalized) activity of all the other output units as well! As opposed to the activation function which does not use information from other functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do psychologists bring to the field of behavioral data science?

A

Understanding human behavior (expertise in designing experiments, measuring, and testing human behavior) and psychological methods (statistics and mathematical models of behavior, the basis of data science methods).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is meant by

1) supervised regression
2) unsupervised regression
3) unsupervised classification
4) supervised classification

A

UNSUPERVISED REGRESSION
You want to predict how many likes a post will have based on information about the poster and the content of the post itself. You have a large dataset containing the number likes for each post and have extracted a number of features about the poster and from the post’s content.

SUPERVISED REGRESSION
You want to create groups of movies so that after someone finishes watching one movie from the group, you can recommend another from the same group. Your dataset contains features extracted from the title and content of the movie, but doesn’t have predefined labels.

UNSUPERVISED CLASSIFICATION
You want to decide whether your idea for a sinterklaas surprise is creative or not. You have a dataset containing features extracted from a number of surprises you, your friends and family have created in the past and also a variable that labels them as creative or not.

SUPERVISED CLASSIFICATION
Does not exist

17
Q

What do random forests consist of? What is the method that randomizes rows? What is the method that randomizes columns?

A

Random forests consists of many decision trees. Bagging randomizes rows and random subspaces randomizes columns.

18
Q

In word2vec, are the dimensions the same as the nodes in the hidden layer?

A

Yes

19
Q

What happens to the accuracy and the training speed of your network when you change the number of dimensions to 300 instead of 100?

A

The accuracy increases and the training speed decreases.