DL test 2023 Flashcards by Roelle Banffer

You have to choose an activation function for a fully connected nn. Which of the following is most likely to lead to dead neurons?

Rectified Linear Unit

How well did you know this?

Not at all

Perfectly

At what stage of the life of a nn model is backpropagation used?

Training

How well did you know this?

Not at all

Perfectly

Which one is true? 1. The final activation in CBOW is the logistic sigmoid function since we only predict one word, 2. CBOW predicts words based only on the previous words, 3. CBOW and Skipgram ignore the order of words, 4. Skipgram predicts the context based on a word

CBOW and Skipgram ignore the order of words, Skipgram predicts the context based on a word

How well did you know this?

Not at all

Perfectly

A model for element classification of set data (i.e. assign a class for each element in the set) needs to be:

equivariant to permutation

How well did you know this?

Not at all

Perfectly

A fully connected (dense) nn is used to model data that resides on a grid domain (such as a 2D image). This model is:

Not translationally invariant nor equivariant

How well did you know this?

Not at all

Perfectly

Choose the best structure for a data efficient model that operates on graph data and assigns a class to a graph

Permutationally equivariant layer(s) followed by a global pooling layer(s) and a softmax layer

How well did you know this?

Not at all

Perfectly

Select the layer(s) of a CNN that inputs a tensor with dimensionality 64x64x28 and outputs an activation map with the dimensionality of 60x60x32

2D Convolutional layer, with 32 filters, each with a kernel of size (5x5), stride 1 and VALID padding

How well did you know this?

Not at all

Perfectly

The number of times a parameter is re-used in an RNN cell is proportional to:

The length of the sequence

How well did you know this?

Not at all

Perfectly

Which of the following activation functions have the right properties to be suitable for serving as a gating mechanism?

The cumulative distribution function of the standard normal distribution, hard sigmoid, logistic sigmoid

How well did you know this?

Not at all

Perfectly

What are the required characteristic(s) of the aggregation function of message passing graph nn?

Produce the average value of all input values, produce the same result for any permutation of the input values, deal with different number of input elements

How well did you know this?

Not at all

Perfectly

Select which statements are true for the message passing graph nn model

the model can learn a representation of a graph with a variable number of unordered edges and nodes, the model can learn a representation of a graph with a fixed number of nodes and bidirectional edges

How well did you know this?

Not at all

Perfectly

What is the number of iterations that a message passing graph nn needs to implement to guarantee that information from each node will reach each other node in a fully connected graph?

How well did you know this?

Not at all

Perfectly

The depth of the message passing graph nn model is proportional to the:

number of iterations of message passing

How well did you know this?

Not at all

Perfectly

When choosing an activation function for a fully connected neural network. Which of the following is more likely to cause vanishing gradients during training?

Logistic Sigmoid

How well did you know this?

Not at all

Perfectly

When choosing an activation function for a fully connected neural network. Which of the following is most likely to lead to “dead” neurons?

Rectified Linear Unit

How well did you know this?

Not at all

Perfectly

At what stage of the life of a neural network model is backpropagation used?

Study These Flashcards

Training

Word Embedding
What probability distribution does the model in diagram (a) above estimate? Give a formula for the correct probability distribution, e.g

Study These Flashcards

What probability distribution does the model in diagram (b) above estimate? Give a formula for the correct probability distribution, e.g.

Study These Flashcards

Word embeddings (e.g. CBOW or Skipgram) can be used to train embeddings of other types of data than just words. Which of the following input data types are suitable for the use of these techniques?

Study These Flashcards

DNA sequences of genes in terms of their four bases (Adenine A, Cytosine C, Guanine G, Thymine T), Tap dance choreography in “Kahnotation” (see figure below),
Time series representing the values of stocks over time

A model consists of 5 convolutional layers operating on a grid. This model is:

Study These Flashcards

Equivariant to translation

Select the layer(s) of a CNN that inputs a tensor with dimensionality 64x64x128 and outputs an activation map with the dimensionality of 60x60x32.

Study These Flashcards

2D Convolutional layer, with 32 filters, each with a kernel of size (5x5), stride 1 and VALID padding.

The task at hand is signature verification. You have access to a dataset of 50000 images of signatures. You only have 2-3 signatures per person.
During the operation of the system, a person provides identification and they sign a document. The task of the model is to verify that the provided signature corresponds to the one in the database.
Describe:
1. The data domain and its symmetries
2. The type of model that is motivated well for the given data domain and problem formulation
2. The loss function
3. How your model computes a verification score during the operation of the system

Study These Flashcards

The task is metric learning since we have many images of signatures and not many classes. We use the siamese network to learn the signatures where we learn classes by taking one image which belongs to the class and another that does not belong to the class. We compare similarities between signatures. We have translation symmetry where the position of the signature with the image does not affect the model in verification and rotational symmetry where the orientation of the signature does not affect the model. The siamese network works outputs the similarity between images. The loss function is contrastive loss. The loss function aims to maximize the similarity between genuine signatures while minimizing the similarity between genuine and imposter signatures. We use the distance between images as a verification score where the lower distance means a higher score.

Suppose the recurring weights W in vanilla RNN (so without gates) has matrix-norm
How fast does the sensitivity of an output o to an input x increase or decrease in terms of l?

Study These Flashcards

It decreases exponentially in l

What is the number of iterations that a Message Passing Graph Neural network needs to implement to guarantee that information from each node will reach each other node in a fully connected graph?

Study These Flashcards

To which of the following is the depth of a Message Passing Graph Neural Network model proportional?

Number of iterations of message passing

The task is molecule classification. The molecules are represented as graphs, where the nodes are the atoms of the molecule and the edges are the bonds of the molecules. Each molecule in the training data is assigned a class label out of a fixed set of 10 labels. Describe the components of a Message Passing Graph Neural network that is suitable for this task.

We have a graph structure representation that fits with a GNN. The message passing can describe how information is passed in the graph. At each iteration, a node is aggregated and information about the node is updated. The information on the neighborhood nodes and the node itself is updated and combined to capture both local and global information. The graph readout function summarizes the node representations to the final size. This is necessary for the final classification. Layers on the model will learn the high level features and patterns of the molecule and perform classification. The final layer produces the probability distribution over the class labels. The selected predicted class has the highest probability.

Why do we need the reparameterizaton trick to train the VAE?

Without it, we cannot backpropagate through a stochastic node

why do we need the reparameterization trick to train the VAE?

Without it we cannot backpropagate through a stochastic node

Model collapse is an effect that occurs when training GAN models where

The model learns to represent only a small part of the distribution of the data

The task at hand is signature verification. You have access to a dataset of 50000 images of signatures. You only have 2-3 signatures per person. During the operation of the system, a person provides identification and signs a document. The task of the model is to verify that the provided signature corresponds to the one in the database. Describe: 1. The problem/model formulation that you will use for this task 2. The model architecture and loss function 3. How you set up the training data and the training process 4. How you use your model (describe steps of the algorithm)

metric learning, few shot learning

Describe a method for translating sentences from one language to another using a sequential ml model. You have access to a training set of pairs of sentences from the source and target language. The training sentences have a variable number of words. The model should be capable of producing more than one correct translation for a given input sentence. (You should not assume conditional independence of output words) Describe: 1. The problem/model formulation that you will use for this task 2. The model architecture and loss function 3. How you set up the training data and the training process 4. How you use your model

sequential prediction problem

The task is predicting the level of bioactivity of molecular compounds. In the dataset, the molecules are represented as graphs, where the nodes are atoms of the molecule and the edges are the bonds between the atoms. Each molecule is assigned a continuous bioactivity between [0,1] Describe: 1. The problem/model formulation that you will use for this task 2. The model architecture and loss function 3. How you set up the training data and the training process 4. How you use your model

Graph structure with GAN

Your task is to repair music recordings. The music recordings are 10 wave signal. Some of these music recordings are injected with very short interruptions (no sound) during transmission. The interruptions can be easily identified with a simple thresholding step and distinguished from quiet music parts. The solutions should fill in the gaps and restore the full music recordings. You are given a training set of music recordings without interruptions. Specify: 1. The problem/model formulation that you will use for this task 2. The model architecture and loss function 3. How you set up the training data and the training process 4. How you use your model

sequence

DL test 2023 Flashcards

(33 cards)