Neural networks Flashcards
Here are some flashcards with questions strictly from the sources:
Flashcard 1
Front: What topics were covered in Week 1 to Week 11?
Back: Week 1 covered “Neural Language Modelling”. Week 2 covered “Neural machine Translation & Transformers”. Week 3 covered “Multi-lingual Machine Translation”. Week 4 covered “Low resource & Multi-modal machine translation”. Week 5 covered “Overhype versus reality: When to use machine translation…and when not to”. Week 6 is “Overview”. Weeks 7
8
Flashcard 2
Front: On what level did the translation model have to be defined according to the text?
Back: The model has to be defined on the word level instead of the sentence level.
Flashcard 3
Front: What is introduced into the translation model as a “hidden variable”?
Back: Underlying connection between source and target words is introduced into the translation model as a so-called “hidden variable”.
Flashcard 4
Front: What is a hidden variable?
Back: A hidden variable is a variable which has an influence on the model but is not actually seen.
Flashcard 5
Front: What does ‘a’ represent in the context of translation probability?
Back: ‘a’ represents alignment “sentence” A sequence of alignments (positions) for each source word.
Flashcard 6
Front: Provide an example of translation probability with word alignment given in the source.
Back: “= I like red bicycles = me gustan bicicletas rojas = 1 2 4 3”.
Flashcard 7
Front: How many target words can source words be connected to in one type of alignment?
Back: Source words can be connected to exactly one target word.
Flashcard 8
Front: What is the term for a source word without connections in alignment?
Back: A source word without connections is called a spurious word.
Flashcard 9
Front: What is “zero fertility” in word alignment referring to?
Back: “Zero fertility” refers to a word not translated.
Flashcard 10
Front: What do phrases in the context of SMT allow?
Back: Phrases allow translation from a word group to a word group.
Flashcard 11
Front: What is a limitation of word-based translation models compared to phrase-based models?
Back: Word-based translation models only allow translation from a single word into a word group.
Flashcard 12
Front: What are some advantages of using phrases over words in translation?
Back: Longer context can generally be captured
and there is better handling of idioms and other multi-word expressions.
Flashcard 13
Front: What constitutes an inconsistent phrase pair according to the example?
Back: Middle: Inconsistent phrase pair.
Flashcard 14
Front: What is the goal of decoding in the context discussed?
Back: Decoding aims to find the best hypothesis.
Flashcard 15
Front: What type of data was mentioned in relation to the neural network forward pass in the lecture overview?
Back: Complex “unstructured” data.
Flashcard 16
Front: What will be covered in a later part of the lecture regarding neural networks?
Back: Neural network forward pass with images and Backpropagation.
Flashcard 17
Front: What is naive text input for a neural network?
Back: Naive text input for neural network.
Flashcard 18
Front: What is a common non-linear function used in neural networks after getting a value for each node?
Back: One of the most common is called a Rectifier Linear Unit (ReLU).
Flashcard 19
Front: What is the mathematical definition of the ReLU function?
Back: f(x) = max(0
x).
Flashcard 20
Front: What happens to the value at a node if it is less than zero when using the ReLU function?
Back: if the value at a node is less than zero
we f(x) = max(0
Flashcard 21
Front: What is the significance of the values 0.875
0.004
Back: These values sum to 1.
Flashcard 22
Front: What is involved in the backward pass of a neural network?
Back: Inputs
outputs (o)
Flashcard 23
Front: What is a common loss function mentioned in the context of neural networks?
Back: Cross-entropy is a common loss function.
Flashcard 24
Front: Is cross-entropy the only loss function?
Back: No
*This is not the only loss function
Flashcard 25
Front: What is the gradient calculated with respect to in the context of neural networks?
Back: Gradient with respect to the weights of the network.
Flashcard 26
Front: What are some terms associated with calculating the gradient?
Back: partial derivative and learning rate.
Flashcard 27
Front: What are the sets used in training and evaluating a model mentioned in the lecture?
Back: Dataset Train model and Dataset Test set.
Flashcard 28
Front: What type of error surfaces do neural networks have?
Back: Neural networks have non-convex error surfaces.
Flashcard 29
Front: What is a consequence of neural networks having non-convex error surfaces in terms of finding minima?
Back: Neural networks have non-convex error surfaces (no global minima). We want to get a good local minimum.
Flashcard 30
Front: What are some methods of gradient descent mentioned in the lecture?
Back: Stochastic gradient descent
batch gradient descent
Flashcard 31
Front: What is “one-hot encoding” used for in the context of neural networks?
Back: Word representation (naïve).
Flashcard 32
Front: What is a problem with using “one-hot encoding” for word representation?
Back: This is sparse! (lots of zeros) Will get even more sparse as the vocabulary grows!.
Flashcard 33
Front: What are artificial neural networks inspired by?
Back: Artificial neural networks (or simply neural networks)
although inspired by the neurons in the human brain.
Flashcard 34
Front: What are neural networks essentially from a mathematical perspective?
Back: …nothing more than a bunch of mathematical functions involving a large number of matrix multiplications.
Flashcard 35
Front: What is a key capability of neural networks?
Back: The power of neural networks (NNs) lies in their ability to create complex mappings (functions) between their inputs and outputs.
Flashcard 36
Front: Why are derivatives of functions important for neural networks?
Back: …as they measure the sensitivity to change of the function output value with respect to a change of its input value. This is very important for training neural net-works.
Flashcard 37
Front: What does an artificial neuron do with its inputs?
Back: An artificial neuron takes several inputs
for example three
Flashcard 38
Front: What are weights in an artificial neuron?
Back: w1
w2 and w3 are weights
Flashcard 39
Front: What is the role of the function z(x) in an artificial neuron?
Back: First
a function z(x) takes all the inputs and converts them into a weighted sum: z(x) = w1x1 + w2x2 + w3x3 + b.
Flashcard 40
Front: What is ‘b’ in the weighted sum of an artificial neuron?
Back: b represents the inclusion of “bias” in each neuron in order to avoid that the weighted sum of the inputs becomes equal to 0. Bias gives the network something to work with in case that all input values are 0.
Flashcard 41
Front: What is the general formula for the weighted sum in a neuron?
Back: z(x) = ∑ i xiwi + b.
Flashcard 42
Front: What happens to the weighted sum after it is calculated in a neuron?
Back: Then
this weighted sum is converted by another function σ(z) into the output of the neuron.
Flashcard 43
Front: What is the function σ(z) called?
Back: The function σ(z) is called the “activation function”.
Flashcard 44
Front: Describe the basic operation of an artificial neuron.
Back: So
basically
Flashcard 45
Front: What is the Heaviside function and what is it based on?
Back: The activation function can be based on the threshold (“Heaviside” function): σ(z) = 0 if z ≤ threshold σ(z) = 1 if z > threshold where the threshold is a real number
a parameter of the neuron.
Flashcard 46
Front: What is a “perceptron”?
Back: A simple type of artificial neuron which takes one or several binary inputs (with values 0 or 1) and has a threshold-based activation function is called a “perceptron”.
Flashcard 47
Front: Are non-linear activation functions typically used in artificial neurons?
Back: They are usually non-linear functions (the reason for this will be explained later) so that an artificial neuron transforms its inputs by a linear weighted sum and a non-linear activation function.
Flashcard 48
Front: Define the sigmoid function.
Back: sigmoid(x) = 1 / (1 + e−x).
Flashcard 49
Front: What is the output range of the sigmoid function?
Back: It converts its input into an output in the range from 0 to 1.
Flashcard 50
Front: Define the hyperbolic tangent function.
Back: tanh(x) = (e2x − 1) / (e2x + 1).
Flashcard 51
Front: What is the output range of the hyperbolic tangent function?
Back: It converts the input into an output in the range from -1 to 1.
Flashcard 52
Front: Define the Rectified Linear Unit (ReLU) function.
Back: reLU(x) = max(0
x).
Flashcard 53
Front: Describe how the ReLU function transforms inputs.
Back: It is basically a linear transformation for inputs greater than zero
while inputs below zero are transformed to zero. The output range is from 0 to ∞.
Flashcard 54
Front: Define the Softmax function for multiple inputs xi.
Back: softmax(xi) = exi / ∑ exi.
Flashcard 55
Front: What is the output range of the Softmax function?
Back: Its output range is from 0 to 1.
Flashcard 56
Front: For what purpose is the Softmax function convenient in modelling?
Back: …it is very convenient for modelling probabilities of different classes xi.
Flashcard 57
Front: Where are Softmax functions used in neural machine translation models?
Back: …a softmax function taking into account all target words in order to decide which of them has the highest probability.
Flashcard 58
Front: What happens when many neurons are connected together?
Back: When many neurons are connected
these operations become a powerful tool.
Flashcard 59
Front: What is a feed-forward network sometimes called
and under what condition is this name argued to be appropriate?
Back: This type of network is sometimes called multilayer perceptron
although it is argued that the name should be used only if its neurons are actually perceptrons (neurons with a threshold activation function).
Flashcard 60
Front: What is the input layer in a neural network?
Back: The input layer consists of (one or more) input neurons. Inputs of this layer are inputs to the entire neural network. The input layer receives the inputs
performs the calculations in its neurons and transmits the output to the subsequent layer. Each neural network must have an input layer.
Flashcard 61
Front: What is the output layer in a neural network?
Back: The output layer consists of (one or more) output neurons. The output layer receives its input from the previous layer. Outputs of this layer repre-sent the outputs of the entire network. The output layer is responsible for producing the final result by performing calculations in its neurons. Each neural network must have an output layer.
Flashcard 62
Front: What is a hidden layer in a neural network?
Back: The hidden layer is in the middle and connects the input and output layer. The word “hidden” implies that they are not visible from outside the network.
Flashcard 63
Front: How many hidden layers can a neural network have?
Back: A neural network can have an arbitrary number of hidden layers
from zero to many.
Flashcard 64
Front: What is a “deep neural network”?
Back: If a neural network has more than one hidden layer
it is called a “deep neural network”.
Flashcard 65
Front: What is “deep learning”?
Back: If such a neural network (more than one hidden layer) is used for machine learning
it is called “deep learning”.
Flashcard 66
Front: What is learned by the first hidden layer in a deep neural network?
Back: In a multi-layer (“deep”) neural network
the first hidden layer is able to learn some relatively simple patterns.
Flashcard 67
Front: What is learned by each additional hidden layer in a deep neural network?
Back: …each additional hidden layer is able to learn progressively more complicated patterns.
Flashcard 68
Front: What is a theoretical capability of neural networks according to the “Universal Approximation Theorem”?
Back: The “Universal Approximation Theorem” states that a neural network with one hidden layer can approximate any continu-ous function for inputs within a specific range.
Flashcard 69
Front: Are there strict rules for building neural networks?
Back: Knowing that there are no strict rules for building neural networks and there are many possibilities to arrange the neurons and define their func-tions
you should be better able to imagine that neural networks really can model practically any function.
Flashcard 70
Front: Is it necessary to use the same activation function for all neurons in a network?
Back: …it is also not necessary to use the same activation function for all neurons in a network! Usually
all neurons in one layer have the same activa-tion function
Flashcard 71
Front: Why are the important activation functions mentioned in the text non-linear?
Back: If all neurons in a network have linear activation functions
no matter how many layers we have
Flashcard 72
Front: How do recurrent neural networks differ from feed-forward neural networks in terms of information flow?
Back: In recurrent neural networks
outputs of some neurons do not pass further to the neurons in the subsequent layer but return to the same neuron as its input.
Flashcard 73
Front: For a feed-forward network with one layer
how are the dependencies between layers formulated?
Back: * H = F (X)
meaning that the values in the hidden layer are a function of the values in the input layer. * Y = F (H)
Flashcard 74
Front: How are the dependencies defined for a recurrent neural network?
Back: Hn = F (Xn
Hn−1) where n refers to the current position (“time frame”) in a sequence. This means that the current values (at position n) in the hidden layer Hn are not dependent only on the current values of the input layer Xn (as in feed-forward networks)
Flashcard 75
Front: What is the structure of RNNs well-suited for modelling?
Back: The structure of RNNs is well suitable for mod-elling of sequences.
Flashcard 76
Front: What does the output of an RNN at a given time step/position depend on?
Back: In total
the output depends not only on the current input at the current time step/position Xt
Flashcard 77
Front: What is a prominent type of network architecture used in Natural Language Processing nowadays?
Back: Nowadays
almost everything in Natural Language Processing is based on so-called “attention networks” (which include the most modern transformer architecture).
Flashcard 78
Front: What do attention networks represent?
Back: They are complex networks which represent how different inputs relate to different outputs.
Flashcard 79
Front: Were neural networks used for machine translation mentioned as being simple?
Back: It should be noted that neural networks used for machine translation are very large and complex
involving a large number of neurons organised in many layers
Flashcard 80
Front: What is characteristic of a feed-forward neural network regarding the direction of the input signal?
Back: It is called “feed-forward” because the input signal is always going forward
from the input layer through the hidden layer(s) to the output layer.