Questions Flashcards

1
Q

Which of the following statements is/are true about word embeddings?
A) Pre-learned embeddings exist.
B) Embeddings can potentially capture additional information compared to a one-hot encoded representation.
C) Embeddings are useful for large dictionary/vocabulary sizes.
D) A word vector of an embedding has the same size as the word vector of a one-hot encoded representation, given a fixed sized dictionary/vocabulary.

A

A, B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

k-means …
A) … does not need a fixed number of cluster centers as input.
B) … needs a fixed number of cluster centers as input.
C) … is a dimensionality reduction method.
D) … is a clustering method.

A

B, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Given an input of size 15x15 and a kernel size of 5x5 with a stride of 1, what is the output size after the convolution operation?
A) 14x14
B) 11x11
C) 10x10
D) 13x13

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following statements is/are true about an 8-bit grayscale image?
A) Can be converted into an RGB image without additional information.
B) It has only a single channel (brightness).
C) Every pixel is represented by 8 channels.
D) The channel information size is 8 bits, which means that 8 values can be stored.

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following statements is/are true about the term ‘hyperparameters’?
A) There are models without any hyperparameters.
B) Hyperparameters are user-specifiable settings that control the model complexity or the training.
C) Hyperparameters can strongly influence the final model performance.
D) Hyperparameters are those model parameters that are adjusted during training.

A

B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following statements is/are true about loss functions?
A) Loss functions are used to obtain the final model prediction.
B) The output of loss functions is in the range [0, 1].
C) Loss functions can have an impact on the training process.
D) Loss functions are used to measure the difference between a model prediction and the true target.

A

C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following is/are useful loss functions for regression problems?
A) Cross entropy
B) Softmax
C) Sigmoid
D) Mean-squared error

A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Standard gradient descent performs an update step based on some step size/learning rate η. Which of the following statements is/are true?
A) If η is negative, we would go into the opposite direction (gradient ascent).
B) If η is too small, the update progress can be very slow.
C) If η is too large, the algorithm might not properly converge to some minimum.
D) If η is 0, no update is performed at all.

A

A, B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is/are typically used activation functions?
A) Cross entropy
B) Sigmoid
C) Tanh
D) ReLU

A

B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Logistic regression …
A) … has an output in the range [0, 1].
B) … is a regression model.
C) … is never a good model choice.
D) … is a classification model.

A

A, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following statements is/are true about pretrained models?
A) Using pretrained models might improve the prediction performance.
B) Pretrained models can be directly used for every task without having to adjust their architecture.
C) Using pretrained models always improves the prediction performance.
D) Pretrained models might be biased.

A

A, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which aspects have to be taken into consideration when dealing with high-dimensional input data?
A) Often difficult to visualize.
B) More features take up more space in memory.
C) Dimensionality reduction techniques might be useful.
D) More features might lead to longer model training times.

A

A, B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Consider the following vocabulary in the fixed order: cat dog wolf cow. Which of the following one-hot-encodings is the correct one for the word ‘wolf’?
A) (1, 1, 0, 1)
B) (3)
C) (1, 2, 3, 4)
D) (0, 0, 1, 0)

A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assume you have the following input text that you want to encode with one-hot-encoding: ‘a cat and a dog and a wolf’. What is the dictionary/vocabulary size?
A) 6
B) 7
C) 5
D) 8

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The bias-variance tradeoff …
A) … is about finding the best ratio of training set size vs. test set size.
B) … is about finding the most underfitting and most overfitting model.
C) … is about finding the best loss functions.
D) … is about finding a compromise between model underfitting and overfitting.

A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following statements is/are true about convolutional neural networks (CNNs)?
A) Because of 2D input data, CNNs cannot be trained using gradient descent.
B) CNNs are the same as fully-connected neural networks, just for 2D data.
C) Weight sharing is an essential part in CNNs.
D) CNNs take advantage of the ‘local structure’ in image data (neighboring pixels are often highly correlated).

A

C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In the forward pass of a neural network, the input vector is …
A) … passed through an element-wise non-linearity, added to bias weights and multiplied by a weight matrix.
B) … added to bias weights, multiplied by a weight matrix and passed through an element-wise non-linearity.
C) … multiplied by a weight matrix, added to bias weights and passed through an element-wise non-linearity.
D) … passed through an element-wise non-linearity, multiplied by a weight matrix and added to bias weights.

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following statements is/are true about the logistic function (sigmoid)?
A) It is a common loss function.
B) It is used in logistic regression.
C) It introduces non-linearity.
D) It is used in linear regression.

A

B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Assume a multi-class classification problem with four classes (1, 2, 3, 4). Further assume that you have a model with a softmax function at the end which produced (0.3, 0.32, 0.35, 0.03). Which class should be chosen as the final classification prediction?
A) Class 4
B) Class 3
C) Class 2
D) Class 1

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which of the following statements is/are true about the softmax function?
A) The sum of all outputs equals 1.
B) It is suitable for multi-class classification problems.
C) It is a generalization of the sigmoid function.
D) The output is always 1 for the predicted class and 0 for all others.

A

A, B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which of the following statements is/are true about padding in convolutional neural networks?
A) Padding is optional.
B) Padding can only be applied to the original input data, i.e., before the first network layer.
C) Padding can be used to keep the input size and output size the same.
D) Padding of size n is the same as using a kernel that is smaller by n compared to a bigger kernel.

A

A, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In a fully-connected neural network …
A) … activation functions should be used in between layers to avoid that multiple linear transformations collapse into a single one.
B) … all inputs are connected to all nodes of the following layer.
C) … the output layer is used for the final model prediction.
D) … each hidden layer can have arbitrarily many nodes.

A

A, B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Batch normalization …
A) … is not applicable in convolutional neural networks.
B) … is only used in the last network layer.
C) … is performed for each mini-batch of training samples.
D) … is performed once for the dataset before training the network.

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which problems might arise when data augmentation is not done carefully?
A) The input data might no longer correlate with/represent the original target values.
B) The model performance might be worse than without augmentation.
C) There are no problems, data augmentation is always safe.
D) The target values might change too much.

A

A, B, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is meant by the term ‘underfitting’?
A) A model fits the training data (too) well but not the test data.
B) A model neither fits the training nor the test data well.
C) A model with too few hyperparameters was selected.
D) A model fits the training and the test data (too) well.

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Which techniques can be used to potentially improve a neural network model in terms of prediction performance?
A) Loss function schedules
B) Deep networks
C) Hyperparameter augmentation
D) Batch normalization

A

B, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which of the following statements is/are true regarding the receptive field in convolutional neural networks?
A) The receptive field always remains constant throughout the depth of the network.
B) The receptive field is the (part of the) input that is connected to a node/neuron.
C) The receptive field is closely related to the terms ‘kernel’ or ‘filter’.
D) The receptive field is often bigger than the original input size.

A

B, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Assume you have grayscale images with width=20 and height=20. What is the dimensionality when you want to train a model with such input data?
A) 400
B) 1200
C) 20
D) 40

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which of the following statements is/are true about empirical risk minimization (ERM)?
A) ERM is typically performed on a dedicated test set.
B) ERM is a method of hyperparameter optimization.
C) ERM is typically performed on a dedicated training set.
D) ERM is a method of estimating the generalization error/risk.

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Considering labeled tabular data, assume you have a feature vector x and a target y for each table entry. Which of the following statements is/are true?
A) y can be numerical.
B) The x of one table entry might be identical to another x table entry.
C) y can be a class label.
D) x and y together form a sample.

A

A, B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

t-distributed stochastic neighbor embedding (t-SNE) …
A) … is a dimensionality reduction method.
B) … is a data augmentation method.
C) … enables visualization of high-dimensional data.
D) … is a clustering method.

A

A, C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which of the following statements is/are true about the result of loss functions (the ‘loss’)?
A) Typically, the higher the loss, the better the prediction.
B) When comparing the loss of two different loss functions, one should choose the function that yielded the lower loss.
C) Different loss functions might have different loss value ranges.
D) Typically, the lower the loss, the better the prediction.

A

B, C, D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Given the following dataset in tabular form (| intensity | color | value | gamma | …), what is the dimensionality of this dataset?
A) 5
B) 4
C) 9
D) 20

A

B

34
Q

Which of the following statements is/are true about classification?
A) In classification, the target values are numerical values.
B) In classification, the target values are class labels.
C) In classification, there should be at least two different classes.
D) In classification, the target values cannot be numbers.

A

B, C

35
Q

Assume you have an n-dimensional input that you want to apply to a logistic regression model. Which of the following statements is/are true?
A) The weights of the logistic regression model are multiplied with the input, a bias is added, the logistic function (sigmoid) is applied, and the result is the final model output.
B) The weights of the logistic regression model are multiplied with the input, a bias is added, and the result is the final model output.
C) The weights of the logistic regression model must be n-dimensional as well.
D) The number of computations is independent of n since it is still only a single layer in the logistic regression model.

A

A, C

36
Q

A Random Forest model …
A) … is a supervised learning model.
B) … incorporates randomness to reduce overfitting
C) … is composed of multiple decision trees.
D) … can be used for classification

A

A, B, C, D

37
Q

Assume you have a classification task where you want to distinguish between cat and dog images. Which of the following is/are potentially meaningful data augmentations with respect to this data?
A) Applying input dropout.
B) Swapping target labels.
C) Adding images of wolves.
D) Adding a slight blur.

A

D

38
Q

Assume you have the following input of size 4x4: [[8 2 0 7],[0 3 3 3],[4 6 9 8],[5 7 4 1]]. What is the output after performing max pooling of size 2x2 with a stride of 2?
A) [[8 7], [7 9]]
B) [9]
C) [[8],[3],[9],[7]]
D) [[8 3 7],[6 9 9],[7 9 9]]

A

A

39
Q

A convex function …
A) … always has a closed-form solution.
B) … usually occurs when training neural networks.
C) … only has one (global) minimum.
D) … sometimes has a closed-form solution.

A

C, D

40
Q

Which of the following statements is/are true about regression?
A) In regression, the target values must be between 0 and 1.
B) In regression, the target values are class labels.
C) In regression, the target values are numerical values.
D) In regression, the input values are used to predict the corresponding target values.

A

C, D

41
Q

Which of the following techniques can be used for image data augmentation?
A) Blurring.
B) Flipping.
C) Zooming/Cropping.
D) Adding random noise.

A

A, B, C, D

42
Q

Which of the following statements is/are true regarding terminology?
A) Parameters represent a concrete model (within some model class).
B) Model selection/training is the process of finding a model from the model class.
C) Hyperparameters control the model complexity or training procedure.
D) The feature vector matrix contains all samples from the dataset, i.e., all labeled data.

A

B, C, D

43
Q

Principal Component Analysis …
A) … enables visualization of high-dimensional data.
B) … is a dimensionality reduction method.
C) … is a clustering method.
D) … is a data augmentation method.

A

A, B

44
Q

The bias-variance trade-off is closely related to …
A) … empirical risk minimization.
B) … principal components.
C) … over- and underfitting.
D) … training and test sets.

A

C

45
Q

What is typical for a supervised machine learning task?
A) Learning a mapping from input to target values.
B) Learning with knowing the input and target values.
C) Learning target values without knowing the input values.
D) Learning without knowing the input and target values.

A

B, A

46
Q

Given a list of unique words, what does one-hot encoding do?
A) It transforms each word into a unique number.
B) It transforms each word into a vector, where all entries are 1 except for the entry that represents a specific word which is set to 0.
C) It transforms each word into a vector, where all entries are 0 except for the entry that represents a specific word which is set to 1.
D) It transforms each word into a value between 0 and 1.

A

C

47
Q

Which of the following statements is/are true about a grayscale 8-bit image?
A) Every channel can encode 8 different values.
B) Every channel can encode 2^8 different values.
C) Can be converted to a color image without additional information.
D) Every pixel is represented by 8 channels.

A

B

48
Q

How many peaks are visible in a Fourier spectrum of a sine wave of 440 Hertz?
A) 440, one per Hertz.
B) Infinitely many.
C) None.
D) Only one.

A

D

49
Q

Given the following labeled sample, x = (0.9, 1.4, -2.5), y = 1. Which of the following statements is/are true?
A) There cannot be another sample with the same data.
B) y is called a label.
C) x is called a feature vector.
D) There are two classes, 0 and 1.

A

B, C

50
Q

Which of the following statements is/are true about data augmentation?
A) Data augmentation can be used to create/generate new samples.
B) Data augmentation can only be applied to image data.
C) Data augmentation can have a negative impact on generalization if done carelessly.
D) Every change to the input data is a useful data augmentation.

A

A, C

51
Q

Affinity Propagation …
A) … is a clustering method.
B) … needs a fixed number of cluster centers as input.
C) … is a dimensionality reduction method.
D) … does not need a fixed number of cluster centers as input.

A

A, D

52
Q

Given the following labeled dataset in tabular form (only the header is shown, y represents the class label column): | x0 | x1 | x2 | y | What is the dimensionality of this dataset?
A) 1.
B) 2.
C) 3.
D) 4.

A

D

53
Q

Which of the following statements is/are true about the generalization error?
A) It is straightforward to calculate if the distribution of future, unseen data is known.
B) It is straightforward to calculate if the loss function was chosen wisely.
C) It is defined as the expected error on the training data.
D) It is defined as the error on future, unseen data.

A

D*

54
Q

Which of the following statements is/are true about under- and overfitting?
A) If you run into overfitting, the model complexity is probably too high.
B) If you run into underfitting, the model has most probably problems to fit the training data.
C) If you run into overfitting, the model has most probably fitted the training data pretty well.
D) If you run into underfitting, the model complexity is probably too low.

A

A, B, C, D

55
Q

Which of the following statements is/are true about the test set method?
A) The test set is used to estimate the risk.
B) The underlying assumption is that the problem at hand has to be a classification problem.
C) Empirical risk minimization is performed on the training set.
D) The underlying assumption is that samples are identically and independently distributed (i.i.d.).

A

C*

56
Q

What does a Fourier transform of a sound signal do?
A) It clusters the constituent frequencies of the signal.
B) It randomly samples the constituent frequencies from the signal.
C) It decomposes the signal into its constituent frequencies.
D) It downprojects the constituent frequencies of the signal.

A

C

57
Q

Which of the following statements is/are true about classification?
A) In classification, the target values are class labels.
B) In classification, there should be at least two different classes.
C) In classification, the target values are numerical values.
D) In classification, the target values cannot be numbers.

A

A, B

58
Q

Which of the following statements is/are true about labeled datasets?
A) A labeled dataset contains only the samples.
B) A labeled dataset contains both the samples as well as their corresponding targets/labels.
C) A labeled dataset can only be tabular data.
D) Datasets from real-world scenarios are always labeled.

A

B

59
Q

Which of the following is/are meaningful data augmentations?
A) Rotating an image by 270 degrees.
B) Rotating an image by 360 degrees.
C) Rotating an image by 180 degrees.
D) Rotating an image by 90 degrees.

A

A C D

60
Q

What is a hyperparameter in the k-nearest-neighbor classification algorithm?
A) The input features of the nearest neighbors.
B) The number of nearest neighbors.
C) The number of principal components of the nearest neighbors.
D) The class labels of the nearest neighbors.

A

B

61
Q

Which of the following statements is/are true about the ReLU activation function?
A) ReLU sets all positive values to zero.
B) ReLU sets all negative values to zero.
C) ReLU leaves all positive values unchanged.
D) ReLU leaves all negative values unchanged.

A

B C

62
Q

Which of the following statements is/are true about convolutions?
A) A convolutional layer in a neural network involves a kernel.
B) A convolutional layer is often followed by an activation function and a pooling layer.
C) A convolution operation in a neural network always keeps the shape of the inputs unchanged.
D) A convolution is a mathematical operation on two tensors.

A

A B D

63
Q

Which of the following statements is/are true about a 2x2 max-pooling layer in a convolutional neural network?
A) It is a form of non-linear downsampling.
B) It takes the maximum value of 2x2 input values.
C) It will lead to loss of information.
D) It aggregates information from all channels into 2x2 = 4 scalars.

A

A B C

64
Q

What are strengths of frameworks like PyTorch?
A) Developed without any influence by industry.
B) Automatic differentiation.
C) Easy switching of computations between CPU and GPU.
D) Straightforward construction of neural networks.

A

B C D

65
Q

Which of the following statements is/are true about the mean-squared error?
A) The mean-squared error is a suitable criterion for splitting into training and test datasets.
B) The mean-squared error is a suitable loss function for regression tasks.
C) The mean-squared error is the mean of the squared differences between model predictions and the corresponding target values.
D) The mean-squared error is a suitable loss function for classification tasks.

A

B C

66
Q

Which of the following statements is/are true?
A) The softmax, instead of the logistic function, can be used if more than 2 different target classes exist.
B) Logistic regression always applies a softmax function on top of a linear regression model.
C) Linear regression is a regression model because it estimates the probability of class membership.
D) Logistic regression is a regression model because it estimates the probability of class membership.

A

A

67
Q

Which of the following statements is/are true about gradient descent?
A) Gradient descent is commonly used when training neural networks.
B) Gradient descent is an optimization algorithm.
C) Gradient descent is not guaranteed to find the global minimum.
D) Gradient descent takes steps in the direction of the negative gradient of the function to minimize.

A

A B C D

68
Q

Residual connections …
A) … reduce the spatial size of the input.
B) … create shortcuts for gradients.
C) … can only be used in convolution neural networks.
D) … allow to create deeper neural networks while maintaining trainability.

A

B D

69
Q

The logistic function y = sigma(x) …
A) … is linear.
B) … has output values y between -1 and +1.
C) … has output values y between 0 and +1.
D) … is non-linear.

A

C, D

70
Q

Which of the following statements is/are true about a vanishing gradient?
A) Vanishing gradient can also occur in random forest models.
B) Vanishing gradient is a desired behavior when training neural networks.
C) The error signal backpropagated through the network vanishes.
D) The stronger the vanishing gradient effect, the better for training.

A

C

71
Q

Downsampling of an input image may be achieved by …
A) … using a flat layer after a convolutional layer.
B) … 1x1 max-pooling.
C) … 3x1 max-pooling.
D) … 3x3 max-pooling.

A

C D

72
Q

Which of the following statements is/are true about a regression task?
A) The mean-squared error is a suitable loss function.
B) The cross-entropy error is a suitable loss function.
C) The target is a categorical value.
D) The target is a numerical value.

A

A, D

73
Q

Which of the following statements is/are true?
A) Data augmentation during training may reduce overfitting.
B) The bigger the learning rate, the better the resulting validation performance.
C) Residual connections do not make sense in convolutional networks.
D) Dropout during training may reduce overfitting.

A

A D

74
Q

Which of the following statements is/are true about the cross-entropy error?
A) The cross-entropy error is a suitable loss function for classification tasks.
B) The cross-entropy error is a suitable criterion for splitting into training and test datasets.
C) The cross-entropy is an error measure between model predictions and the corresponding target values.
D) The cross-entropy error is a suitable loss function for regression tasks.

A

A C

75
Q

A linear regression model …
A) … does not have a closed-form solution.
B) … is, e.g., a polynomial of degree 0.
C) … has a closed-form solution.
D) … is, e.g., a polynomial of degree 3.

A

B C

76
Q

Which of the following statements is/are true about the concept of ‘strides’ in convolutional networks?
A) The stride specifies the amount of pixels by which a filter is shifted.
B) Striding cannot be used in conjunction with pooling.
C) The stride specifies the amount of pixels to be aggregated by a filter.
D) The stride influences the output size of a convolutional layer.

A

A D

77
Q

Dropout …
A) … is always used during training as well as validation time.
B) … increases the validation loss in order to decrease the training loss.
C) … is only used during validation time.
D) … may increase the validation performance.

A

D

78
Q

A non-convex function …
A) … has at least one global minimum but potentially several local minima.
B) … is uncommon in deep learning.
C) … usually occurs when training neural networks.
D) … has a closed-form solution.

A

A C

79
Q

Which of the following modules is normally NOT found in a fully-connected neural network?
A) A max-pooling layer.
B) A convolutional layer.
C) A ReLU activation function.
D) A linear layer.

A

A B

80
Q

Which of the following statements is/are true about pretrained models?
A) Pretrained models can be directly used for every task without having to adjust their architecture.
B) Using pretrained models always improves the prediction performance.
C) Using pretrained models might improve the prediction performance.
D) Pretrained models might be biased.

A

C D

81
Q

Why do convolutional neural networks (CNNs) generally perform well on image data?
A) Because all CNNs are pretrained on image data.
B) Because CNNs utilize fully-connected layers at the end of the model architecture.
C) Because CNNs can deal with high-dimensional data.
D) Because CNNs take advantage of the ‘local structure’ in image data (neighboring pixels are often highly correlated).

A

D