Questions Flashcards

Question 1

Q

Which of the following statements is/are true about word embeddings?
A) Pre-learned embeddings exist.
B) Embeddings can potentially capture additional information compared to a one-hot encoded representation.
C) Embeddings are useful for large dictionary/vocabulary sizes.
D) A word vector of an embedding has the same size as the word vector of a one-hot encoded representation, given a fixed sized dictionary/vocabulary.

Question 2

Q

k-means …
A) … does not need a fixed number of cluster centers as input.
B) … needs a fixed number of cluster centers as input.
C) … is a dimensionality reduction method.
D) … is a clustering method.

Question 3

Q

Given an input of size 15x15 and a kernel size of 5x5 with a stride of 1, what is the output size after the convolution operation?
A) 14x14
B) 11x11
C) 10x10
D) 13x13

Question 4

Q

Which of the following statements is/are true about an 8-bit grayscale image?
A) Can be converted into an RGB image without additional information.
B) It has only a single channel (brightness).
C) Every pixel is represented by 8 channels.
D) The channel information size is 8 bits, which means that 8 values can be stored.

Question 5

Q

Which of the following statements is/are true about the term ‘hyperparameters’?
A) There are models without any hyperparameters.
B) Hyperparameters are user-specifiable settings that control the model complexity or the training.
C) Hyperparameters can strongly influence the final model performance.
D) Hyperparameters are those model parameters that are adjusted during training.

Question 6

Q

Which of the following statements is/are true about loss functions?
A) Loss functions are used to obtain the final model prediction.
B) The output of loss functions is in the range [0, 1].
C) Loss functions can have an impact on the training process.
D) Loss functions are used to measure the difference between a model prediction and the true target.

Question 7

Q

Which of the following is/are useful loss functions for regression problems?
A) Cross entropy
B) Softmax
C) Sigmoid
D) Mean-squared error

Question 8

Q

Standard gradient descent performs an update step based on some step size/learning rate η. Which of the following statements is/are true?
A) If η is negative, we would go into the opposite direction (gradient ascent).
B) If η is too small, the update progress can be very slow.
C) If η is too large, the algorithm might not properly converge to some minimum.
D) If η is 0, no update is performed at all.

Answer

A

A, B, C, D

Question 9

Q

Which of the following is/are typically used activation functions?
A) Cross entropy
B) Sigmoid
C) Tanh
D) ReLU

Question 10

Q

Logistic regression …
A) … has an output in the range [0, 1].
B) … is a regression model.
C) … is never a good model choice.
D) … is a classification model.

Question 11

Q

Which of the following statements is/are true about pretrained models?
A) Using pretrained models might improve the prediction performance.
B) Pretrained models can be directly used for every task without having to adjust their architecture.
C) Using pretrained models always improves the prediction performance.
D) Pretrained models might be biased.

Question 12

Q

Which aspects have to be taken into consideration when dealing with high-dimensional input data?
A) Often difficult to visualize.
B) More features take up more space in memory.
C) Dimensionality reduction techniques might be useful.
D) More features might lead to longer model training times.

Answer

A

A, B, C, D

Question 13

Q

Consider the following vocabulary in the fixed order: cat dog wolf cow. Which of the following one-hot-encodings is the correct one for the word ‘wolf’?
A) (1, 1, 0, 1)
B) (3)
C) (1, 2, 3, 4)
D) (0, 0, 1, 0)

Question 14

Q

Assume you have the following input text that you want to encode with one-hot-encoding: ‘a cat and a dog and a wolf’. What is the dictionary/vocabulary size?
A) 6
B) 7
C) 5
D) 8

Question 15

Q

The bias-variance tradeoff …
A) … is about finding the best ratio of training set size vs. test set size.
B) … is about finding the most underfitting and most overfitting model.
C) … is about finding the best loss functions.
D) … is about finding a compromise between model underfitting and overfitting.

Question 16

Q

Which of the following statements is/are true about convolutional neural networks (CNNs)?
A) Because of 2D input data, CNNs cannot be trained using gradient descent.
B) CNNs are the same as fully-connected neural networks, just for 2D data.
C) Weight sharing is an essential part in CNNs.
D) CNNs take advantage of the ‘local structure’ in image data (neighboring pixels are often highly correlated).

Question 17

Q

In the forward pass of a neural network, the input vector is …
A) … passed through an element-wise non-linearity, added to bias weights and multiplied by a weight matrix.
B) … added to bias weights, multiplied by a weight matrix and passed through an element-wise non-linearity.
C) … multiplied by a weight matrix, added to bias weights and passed through an element-wise non-linearity.
D) … passed through an element-wise non-linearity, multiplied by a weight matrix and added to bias weights.

Question 18

Q

Which of the following statements is/are true about the logistic function (sigmoid)?
A) It is a common loss function.
B) It is used in logistic regression.
C) It introduces non-linearity.
D) It is used in linear regression.

Question 19

Q

Assume a multi-class classification problem with four classes (1, 2, 3, 4). Further assume that you have a model with a softmax function at the end which produced (0.3, 0.32, 0.35, 0.03). Which class should be chosen as the final classification prediction?
A) Class 4
B) Class 3
C) Class 2
D) Class 1

Question 20

Q

Which of the following statements is/are true about the softmax function?
A) The sum of all outputs equals 1.
B) It is suitable for multi-class classification problems.
C) It is a generalization of the sigmoid function.
D) The output is always 1 for the predicted class and 0 for all others.

Question 21

Q

Which of the following statements is/are true about padding in convolutional neural networks?
A) Padding is optional.
B) Padding can only be applied to the original input data, i.e., before the first network layer.
C) Padding can be used to keep the input size and output size the same.
D) Padding of size n is the same as using a kernel that is smaller by n compared to a bigger kernel.

Question 22

Q

In a fully-connected neural network …
A) … activation functions should be used in between layers to avoid that multiple linear transformations collapse into a single one.
B) … all inputs are connected to all nodes of the following layer.
C) … the output layer is used for the final model prediction.
D) … each hidden layer can have arbitrarily many nodes.

Answer

A

A, B, C, D

Question 23

Q

Batch normalization …
A) … is not applicable in convolutional neural networks.
B) … is only used in the last network layer.
C) … is performed for each mini-batch of training samples.
D) … is performed once for the dataset before training the network.

Question 24

Q

Which problems might arise when data augmentation is not done carefully?
A) The input data might no longer correlate with/represent the original target values.
B) The model performance might be worse than without augmentation.
C) There are no problems, data augmentation is always safe.
D) The target values might change too much.

Question 25

Q

What is meant by the term ‘underfitting’?
A) A model fits the training data (too) well but not the test data.
B) A model neither fits the training nor the test data well.
C) A model with too few hyperparameters was selected.
D) A model fits the training and the test data (too) well.

Question 26

Q

Which techniques can be used to potentially improve a neural network model in terms of prediction performance?
A) Loss function schedules
B) Deep networks
C) Hyperparameter augmentation
D) Batch normalization

Question 27

Q

Which of the following statements is/are true regarding the receptive field in convolutional neural networks?
A) The receptive field always remains constant throughout the depth of the network.
B) The receptive field is the (part of the) input that is connected to a node/neuron.
C) The receptive field is closely related to the terms ‘kernel’ or ‘filter’.
D) The receptive field is often bigger than the original input size.

Question 28

Q

Assume you have grayscale images with width=20 and height=20. What is the dimensionality when you want to train a model with such input data?
A) 400
B) 1200
C) 20
D) 40

Question 29

Q

Which of the following statements is/are true about empirical risk minimization (ERM)?
A) ERM is typically performed on a dedicated test set.
B) ERM is a method of hyperparameter optimization.
C) ERM is typically performed on a dedicated training set.
D) ERM is a method of estimating the generalization error/risk.

Question 30

Q

Considering labeled tabular data, assume you have a feature vector x and a target y for each table entry. Which of the following statements is/are true?
A) y can be numerical.
B) The x of one table entry might be identical to another x table entry.
C) y can be a class label.
D) x and y together form a sample.

Answer

A

A, B, C, D

Question 31

Q

t-distributed stochastic neighbor embedding (t-SNE) …
A) … is a dimensionality reduction method.
B) … is a data augmentation method.
C) … enables visualization of high-dimensional data.
D) … is a clustering method.

Question 32

Q

Which of the following statements is/are true about the result of loss functions (the ‘loss’)?
A) Typically, the higher the loss, the better the prediction.
B) When comparing the loss of two different loss functions, one should choose the function that yielded the lower loss.
C) Different loss functions might have different loss value ranges.
D) Typically, the lower the loss, the better the prediction.

Question 33

Q

Question 34

Q

Which of the following statements is/are true about classification?
A) In classification, the target values are numerical values.
B) In classification, the target values are class labels.
C) In classification, there should be at least two different classes.
D) In classification, the target values cannot be numbers.

Question 35

Q

Assume you have an n-dimensional input that you want to apply to a logistic regression model. Which of the following statements is/are true?
A) The weights of the logistic regression model are multiplied with the input, a bias is added, the logistic function (sigmoid) is applied, and the result is the final model output.
B) The weights of the logistic regression model are multiplied with the input, a bias is added, and the result is the final model output.
C) The weights of the logistic regression model must be n-dimensional as well.
D) The number of computations is independent of n since it is still only a single layer in the logistic regression model.

Question 36

Q

A Random Forest model …
A) … is a supervised learning model.
B) … incorporates randomness to reduce overfitting
C) … is composed of multiple decision trees.
D) … can be used for classification

Answer

A

A, B, C, D

Question 37

Q

Assume you have a classification task where you want to distinguish between cat and dog images. Which of the following is/are potentially meaningful data augmentations with respect to this data?
A) Applying input dropout.
B) Swapping target labels.
C) Adding images of wolves.
D) Adding a slight blur.

Question 38

Q

Assume you have the following input of size 4x4: [[8 2 0 7],[0 3 3 3],[4 6 9 8],[5 7 4 1]]. What is the output after performing max pooling of size 2x2 with a stride of 2?
A) [[8 7], [7 9]]
B) [9]
C) [[8],[3],[9],[7]]
D) [[8 3 7],[6 9 9],[7 9 9]]

Question 39

Q

A convex function …
A) … always has a closed-form solution.
B) … usually occurs when training neural networks.
C) … only has one (global) minimum.
D) … sometimes has a closed-form solution.

Question 40

Q

Which of the following statements is/are true about regression?
A) In regression, the target values must be between 0 and 1.
B) In regression, the target values are class labels.
C) In regression, the target values are numerical values.
D) In regression, the input values are used to predict the corresponding target values.

Question 41

Q

Which of the following techniques can be used for image data augmentation?
A) Blurring.
B) Flipping.
C) Zooming/Cropping.
D) Adding random noise.

Answer

A

A, B, C, D

Question 42

Q

Which of the following statements is/are true regarding terminology?
A) Parameters represent a concrete model (within some model class).
B) Model selection/training is the process of finding a model from the model class.
C) Hyperparameters control the model complexity or training procedure.
D) The feature vector matrix contains all samples from the dataset, i.e., all labeled data.

Question 43

Q

Principal Component Analysis …
A) … enables visualization of high-dimensional data.
B) … is a dimensionality reduction method.
C) … is a clustering method.
D) … is a data augmentation method.

Question 44

Q

The bias-variance trade-off is closely related to …
A) … empirical risk minimization.
B) … principal components.
C) … over- and underfitting.
D) … training and test sets.

Question 45

Q

What is typical for a supervised machine learning task?
A) Learning a mapping from input to target values.
B) Learning with knowing the input and target values.
C) Learning target values without knowing the input values.
D) Learning without knowing the input and target values.

Question 46

Q

Given a list of unique words, what does one-hot encoding do?
A) It transforms each word into a unique number.
B) It transforms each word into a vector, where all entries are 1 except for the entry that represents a specific word which is set to 0.
C) It transforms each word into a vector, where all entries are 0 except for the entry that represents a specific word which is set to 1.
D) It transforms each word into a value between 0 and 1.

Question 47

Q

Which of the following statements is/are true about a grayscale 8-bit image?
A) Every channel can encode 8 different values.
B) Every channel can encode 2^8 different values.
C) Can be converted to a color image without additional information.
D) Every pixel is represented by 8 channels.

Question 48

Q

How many peaks are visible in a Fourier spectrum of a sine wave of 440 Hertz?
A) 440, one per Hertz.
B) Infinitely many.
C) None.
D) Only one.

Question 49

Q

Given the following labeled sample, x = (0.9, 1.4, -2.5), y = 1. Which of the following statements is/are true?
A) There cannot be another sample with the same data.
B) y is called a label.
C) x is called a feature vector.
D) There are two classes, 0 and 1.

Question 50

Q

Which of the following statements is/are true about data augmentation?
A) Data augmentation can be used to create/generate new samples.
B) Data augmentation can only be applied to image data.
C) Data augmentation can have a negative impact on generalization if done carelessly.
D) Every change to the input data is a useful data augmentation.

Question 51

Q

Affinity Propagation …
A) … is a clustering method.
B) … needs a fixed number of cluster centers as input.
C) … is a dimensionality reduction method.
D) … does not need a fixed number of cluster centers as input.

Question 52

Q

Given the following labeled dataset in tabular form (only the header is shown, y represents the class label column): | x0 | x1 | x2 | y | What is the dimensionality of this dataset?
A) 1.
B) 2.
C) 3.
D) 4.

Question 53

Q

Which of the following statements is/are true about the generalization error?
A) It is straightforward to calculate if the distribution of future, unseen data is known.
B) It is straightforward to calculate if the loss function was chosen wisely.
C) It is defined as the expected error on the training data.
D) It is defined as the error on future, unseen data.

Question 54

Q

Which of the following statements is/are true about under- and overfitting?
A) If you run into overfitting, the model complexity is probably too high.
B) If you run into underfitting, the model has most probably problems to fit the training data.
C) If you run into overfitting, the model has most probably fitted the training data pretty well.
D) If you run into underfitting, the model complexity is probably too low.

Answer

A

A, B, C, D

Question 55

Q

Which of the following statements is/are true about the test set method?
A) The test set is used to estimate the risk.
B) The underlying assumption is that the problem at hand has to be a classification problem.
C) Empirical risk minimization is performed on the training set.
D) The underlying assumption is that samples are identically and independently distributed (i.i.d.).

Question 56

Q

What does a Fourier transform of a sound signal do?
A) It clusters the constituent frequencies of the signal.
B) It randomly samples the constituent frequencies from the signal.
C) It decomposes the signal into its constituent frequencies.
D) It downprojects the constituent frequencies of the signal.

Question 57

Q

Which of the following statements is/are true about classification?
A) In classification, the target values are class labels.
B) In classification, there should be at least two different classes.
C) In classification, the target values are numerical values.
D) In classification, the target values cannot be numbers.

Question 58

Q

Which of the following statements is/are true about labeled datasets?
A) A labeled dataset contains only the samples.
B) A labeled dataset contains both the samples as well as their corresponding targets/labels.
C) A labeled dataset can only be tabular data.
D) Datasets from real-world scenarios are always labeled.

Question 59

Q

Which of the following is/are meaningful data augmentations?
A) Rotating an image by 270 degrees.
B) Rotating an image by 360 degrees.
C) Rotating an image by 180 degrees.
D) Rotating an image by 90 degrees.

Question 60

Q

What is a hyperparameter in the k-nearest-neighbor classification algorithm?
A) The input features of the nearest neighbors.
B) The number of nearest neighbors.
C) The number of principal components of the nearest neighbors.
D) The class labels of the nearest neighbors.

Question 61

Q

Which of the following statements is/are true about the ReLU activation function?
A) ReLU sets all positive values to zero.
B) ReLU sets all negative values to zero.
C) ReLU leaves all positive values unchanged.
D) ReLU leaves all negative values unchanged.

Question 62

Q

Which of the following statements is/are true about convolutions?
A) A convolutional layer in a neural network involves a kernel.
B) A convolutional layer is often followed by an activation function and a pooling layer.
C) A convolution operation in a neural network always keeps the shape of the inputs unchanged.
D) A convolution is a mathematical operation on two tensors.

Question 63

Q

Which of the following statements is/are true about a 2x2 max-pooling layer in a convolutional neural network?
A) It is a form of non-linear downsampling.
B) It takes the maximum value of 2x2 input values.
C) It will lead to loss of information.
D) It aggregates information from all channels into 2x2 = 4 scalars.

Question 64

Q

What are strengths of frameworks like PyTorch?
A) Developed without any influence by industry.
B) Automatic differentiation.
C) Easy switching of computations between CPU and GPU.
D) Straightforward construction of neural networks.