Previous Questions Flashcards

Question 1

Q

What is a pooling layer? What does it do?

Answer

A

After Convolutional layer usually: downsampling operation that reduces the dimensions of input feature map
Use
o Extracts important features
o Reduced computational complexity
o Reduces overfitting
o Neighbor pixel strongly correlated, makes sense to combine them

Question 2

Q

Would be good practice after every pooling layer multiply features by 2

Answer

A

Images are losing their spatial resolution when going through a regular CNN which can be an issue for semantic segmentation/ object detection. Solution: recover spatial information that was lost in earlier pooling layers by upsample the output image by a factor of 2.

Question 3

Q

For Neural Network against Overfitting Ridge regularization or Dropout? Or early stopping?

Answer

A

stopping?
* Answer: Dropout
* Explanation:
o CNN: weights are organized in filter kernels that slide across the input image to extract feature -> filters shared across different spatial locations of the input.
o Ridge regularization: adding a penalty term to each individual weight in the network.
o Ridge regularization: disrupt the sharing of weights and the spatial relationships captured by the filters.
o Adding penalty to each weight independently: potentially altering the balance and importance of the shared weights; regularization penalty may affect the weights differently at different spatial locations -> undermine the shared knowledge encoded in the weights; Encourages weights to be small and sparse

Question 4

Q

Why dropout is preferred over early stopping in a Neural Network?

Answer

A

o “Neurons trained with dropout cannot co-adapt with their neighboring neurons; they have to be as useful as possible on their own. They also cannot rely excessively on just a few input neurons; they must pay attention to each of their input neurons. They end up being less sensitive to slight changes in the inputs. In the end, you get a more robust network that generalizes better.”
o a unique neural network is generated at each training step. Since each neuron can be either present or absent, there are a total of 2N possible networks (where N is the total number of droppable neurons). This is such a huge number that it is virtually impossible for the same neural network to be sampled twice. Once you have run 10,000 training steps, you have essentially trained 10,000 different neural networks (each with just one training instance). These neural networks are obviously not independent because they share many of their weights, but they are nevertheless all different. The resulting neural network can be seen as an averaging ensemble of all these smaller neural networks.”

Question 5

Q

how would you handle underfitting using dropout? = what should you do if you have dropout and you are underfitting

Answer

A

decrease dropout rate
Related Question:
o If model underfits: Decrease or increase Dropout? : decrease
“If you observe that the model is overfitting, you can increase the dropout rate. Conversely, you should try decreasing the dropout rate if the model underfits the training set. It can also help to increase the dropout rate for large layers, and reduce it for small ones. Moreover, many state-of-the-art architectures only use dropout after the last hidden layer, so you may want to try this if full dropout is too strong. Dropout does tend to significantly slow down convergence, but it usually results in a much better model when tuned properly. So, it is generally well worth the extra time and effort.”

Question 6

Q

Why use both early stopping & dropout rate? Is good to use both at same time?

Answer

A

GPT: can be effective because they address different aspects of overfitting
Early stopping helps control the capacity of the model by stopping training before it starts overfitting
dropout introduces randomness during training, preventing the network from relying too heavily on any specific set of features or neurons.

Question 7

Q

What would I choose for image dimension reduction PCA and SVD (really deep question about the underlying reasons)

Answer

A

SVD for images, PCA for texts classification -> rule of thumb PCA numeric data and SVD image data
SVD taking a matrix directly as input (computationally less expensive), not requiring a calculation of the Covariance Matrix as for PCA
PCA focus: linear relationships between variables, which are possible to derive for images but a bit counterintuitive and more logical for numerical inputs with actual variables and observations

Question 8

Q

PCA – how are the PC’s being calculated. How does it work?

Answer

A

Calculate covariance matrix
Calculate & oder Eigenvalues & -vectors of 2
o Eigenvalues lambda = amount of variance in each PC
o Eigenvectors = direction of axis where most variance is
Choose number of dimensions
o Caclulate proportion of variance explained: eigenvalue / sum of all eigenvalues
o Decide threshold
Create feature vector (matrix which as choosen eigenvectors as columns) & rotate dataset (SVD)

Question 9

Q

How to find optimal K in KMeans

Answer

A

Answer: Silhouette Score
Explanation
o Silhoutte Score measures compactness & seperation of clusters
o Calculation: Average of all silhouette scores of all data points
 Silhoutte score for one: (b-a)/max(a,b)
a: intra-cluster distance = mean distance to other instances in same cluster
b: inter-cluster distance = mean distance to instances in nearest-cluster
o Range: -1 to 1 (1 = better defined cluster)
o Try different k values & Choose the one that maximizes score

Question 10

Q

Why is RNN better than ARIMA for time series data?

Answer

A

RNNs are specifically designed to handle sequential data
Good for capturing long-term dependencies and patterns across time steps (LSTM)
ARIMA requires preprocessing to remove the trend and seasonality and you need to find the right parameters for the components (AR, I, MA). In contrast, RNNs can automatically learn relevant features and representations directly from the data.
Advantage of ARIMA: interpretability  insights into the underlying process generating the data

Question 11

Q

What is TLU?

Answer

A

Answer: Threshold Logic Unit = type of neuron or unit used in NN -> computes weighted sum of input values & compared to predefined threshold
o sum > thresholt -> ouptut 1
o sum < threshold -> output 0
Background: especially for binary classification

Question 12

Q

AdaBoost: Adaptive Boosting

Answer

A

o focuses on difficult-to-classify samples by assigning higher weights to them during training
o Goal: By iteratively adjusting the weights and training weak learners -> improve the overall model performance and handle complex classification tasks
o Able to handle imbalanced data
o Steps
 Initlaize weights (to all training samples same)
 Train week lerner e.g. decision tree
 Compute error: sum of weights for misclassified samples
 Updata weights
 Repeat until desired performance achieved or fiexed iterations
 Aggregate predictionns: combine predictions of all weak learners by assigning weight to theire predictions based on performance during training
 Final prediction: the weighted vote or average of week learners prediction

Question 13

Q

Can you do multiple class classification with a SVM

Answer

A

Answer: generally only for binary but can use iterative / several classifiers
Background
o One vs All Approach: train multiple binary SVM classifiers, where each classifier is trained to distinguish one class from the rest. For a problem with N classes, N binary SVM classifiers are trained, and each classifier is trained to classify instances belonging to one class as positive and instances belonging to the other N-1 classes as negative. During prediction, all classifiers are applied to the test instance, and the class associated with the classifier that produces the highest confidence or decision score is assigned to the test instance

Question 14

Q

what are the autoencoders (how they are working and what types there are)

Answer

A

Unsupervised generative models: They aim to learn efficient representations of the input data by encoding it into a lower-dimensional latent space and then decoding it back to the original input format.
capable of randomly generating new data that looks very similar to training data.
consists of two parts: Encoding (converts inputs into a latent representation) and Decoding (converts internal representation to outputs)
Can be used for dimensionality reduction, feature extraction and visualizations
Types: Convolutional Autoencoder (for images), Recurrent Autoencoder (for sequences), Denoising Autoencoder (learn useful features by adding noise to the inputs)

Question 15

Q

Why we used Colab and not UCloud?

Answer

A

Use of GPUs in Google Colab (no GPUs available in UCloud)
Sometimes UCloud had performance issues, processing was very slow sometimes
Ucloud had access issues when many people are using it (the session does not start for a few hours)

Question 16

Q

Cross-Validation: Effects & How to pick number of folds

Answer

Study These Flashcards

A

More reliable evaluation of a model’s performance compared to a single train-test split.
Allows you to get an idea how precise the estimate is (standard deviation) by looking at the performance in each fold.
How to decide on the number of folds:
o Computational cost of training the model several times is high with a large number of folds.
o A large number of folds is only possible if you have enough data as the validation set would get very small  both sets should contain sufficient variation such that the underlining distribution is represented
We used 3 folds (cv=3) to reduce the CPU time needed (we could have used 5)

Previous Questions Flashcards

(16 cards)