Neural networks Flashcards by Yana Koleva

Here are some flashcards with questions strictly from the sources:

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Flashcard 1

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Front: What topics were covered in Week 1 to Week 11?

How well did you know this?

Not at all

Perfectly

Back: Week 1 covered “Neural Language Modelling”. Week 2 covered “Neural machine Translation & Transformers”. Week 3 covered “Multi-lingual Machine Translation”. Week 4 covered “Low resource & Multi-modal machine translation”. Week 5 covered “Overhype versus reality: When to use machine translation…and when not to”. Week 6 is “Overview”. Weeks 7

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Flashcard 2

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Front: On what level did the translation model have to be defined according to the text?

How well did you know this?

Not at all

Perfectly

Back: The model has to be defined on the word level instead of the sentence level.

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Flashcard 3

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Front: What is introduced into the translation model as a “hidden variable”?

How well did you know this?

Not at all

Perfectly

Back: Underlying connection between source and target words is introduced into the translation model as a so-called “hidden variable”.

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Flashcard 4

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Front: What is a hidden variable?

How well did you know this?

Not at all

Perfectly

Back: A hidden variable is a variable which has an influence on the model but is not actually seen.

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Flashcard 5

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

**Front:** What does 'a' represent in the context of translation probability?

**Back:** 'a' represents alignment “sentence” A sequence of alignments (positions) for each source word.

**Flashcard 6**

**Front:** Provide an example of translation probability with word alignment given in the source.

**Back:** "= I like red bicycles = me gustan bicicletas rojas = 1 2 4 3".

**Flashcard 7**

**Front:** How many target words can source words be connected to in one type of alignment?

**Back:** Source words can be connected to exactly one target word.

**Flashcard 8**

**Front:** What is the term for a source word without connections in alignment?

**Back:** A source word without connections is called a spurious word.

**Flashcard 9**

**Front:** What is "zero fertility" in word alignment referring to?

**Back:** "Zero fertility" refers to a word not translated.

**Flashcard 10**

**Front:** What do phrases in the context of SMT allow?

**Back:** Phrases allow translation from a word group to a word group.

**Flashcard 11**

**Front:** What is a limitation of word-based translation models compared to phrase-based models?

**Back:** Word-based translation models only allow translation from a single word into a word group.

**Flashcard 12**

**Front:** What are some advantages of using phrases over words in translation?

**Back:** Longer context can generally be captured

and there is better handling of idioms and other multi-word expressions.

**Flashcard 13**

**Front:** What constitutes an inconsistent phrase pair according to the example?

**Back:** Middle: Inconsistent phrase pair.

**Flashcard 14**

**Front:** What is the goal of decoding in the context discussed?

**Back:** Decoding aims to find the best hypothesis.

**Flashcard 15**

**Front:** What type of data was mentioned in relation to the neural network forward pass in the lecture overview?

**Back:** Complex “unstructured” data.

**Flashcard 16**

**Front:** What will be covered in a later part of the lecture regarding neural networks?

**Back:** Neural network forward pass with images and Backpropagation.

**Flashcard 17**

**Front:** What is naive text input for a neural network?

**Back:** Naive text input for neural network.

**Flashcard 18**

**Front:** What is a common non-linear function used in neural networks after getting a value for each node?

**Back:** One of the most common is called a Rectifier Linear Unit (ReLU).

**Flashcard 19**

**Front:** What is the mathematical definition of the ReLU function?

**Back:** f(x) = max(0

x).

**Flashcard 20**

**Front:** What happens to the value at a node if it is less than zero when using the ReLU function?

**Back:** if the value at a node is less than zero

we f(x) = max(0

**Flashcard 21**

**Front:** What is the significance of the values 0.875

0.004

**Back:** These values sum to 1.

**Flashcard 22**

**Front:** What is involved in the backward pass of a neural network?

**Back:** Inputs

outputs (o)

**Flashcard 23**

**Front:** What is a common loss function mentioned in the context of neural networks?

**Back:** Cross-entropy is a common loss function.

**Flashcard 24**

**Front:** Is cross-entropy the only loss function?

**Back:** No

*This is not the only loss function

**Flashcard 25**

**Front:** What is the gradient calculated with respect to in the context of neural networks?

**Back:** Gradient with respect to the weights of the network.

**Flashcard 26**

**Front:** What are some terms associated with calculating the gradient?

**Back:** partial derivative and learning rate.

**Flashcard 27**

**Front:** What are the sets used in training and evaluating a model mentioned in the lecture?

**Back:** Dataset Train model and Dataset Test set.

**Flashcard 28**

**Front:** What type of error surfaces do neural networks have?

**Back:** Neural networks have non-convex error surfaces.

**Flashcard 29**

**Front:** What is a consequence of neural networks having non-convex error surfaces in terms of finding minima?

**Back:** Neural networks have non-convex error surfaces (no global minima). We want to get a good local minimum.

**Flashcard 30**

**Front:** What are some methods of gradient descent mentioned in the lecture?

**Back:** Stochastic gradient descent

batch gradient descent

**Flashcard 31**

**Front:** What is "one-hot encoding" used for in the context of neural networks?

**Back:** Word representation (naïve).

**Flashcard 32**

**Front:** What is a problem with using "one-hot encoding" for word representation?

**Back:** This is sparse! (lots of zeros) Will get even more sparse as the vocabulary grows!.

**Flashcard 33**

**Front:** What are artificial neural networks inspired by?

**Back:** Artificial neural networks (or simply neural networks)

although inspired by the neurons in the human brain.

**Flashcard 34**

**Front:** What are neural networks essentially from a mathematical perspective?

**Back:** ...nothing more than a bunch of mathematical functions involving a large number of matrix multiplications.

**Flashcard 35**

**Front:** What is a key capability of neural networks?

**Back:** The power of neural networks (NNs) lies in their ability to create complex mappings (functions) between their inputs and outputs.

**Flashcard 36**

**Front:** Why are derivatives of functions important for neural networks?

**Back:** ...as they measure the sensitivity to change of the function output value with respect to a change of its input value. This is very important for training neural net-works.

**Flashcard 37**

**Front:** What does an artificial neuron do with its inputs?

**Back:** An artificial neuron takes several inputs

for example three

**Flashcard 38**

**Front:** What are weights in an artificial neuron?

**Back:** w1

w2 and w3 are weights

**Flashcard 39**

**Front:** What is the role of the function z(x) in an artificial neuron?

**Back:** First

a function z(x) takes all the inputs and converts them into a weighted sum: z(x) = w1x1 + w2x2 + w3x3 + b.

**Flashcard 40**

**Front:** What is 'b' in the weighted sum of an artificial neuron?

**Back:** b represents the inclusion of “bias” in each neuron in order to avoid that the weighted sum of the inputs becomes equal to 0. Bias gives the network something to work with in case that all input values are 0.

**Flashcard 41**

**Front:** What is the general formula for the weighted sum in a neuron?

**Back:** z(x) = ∑ i xiwi + b.

**Flashcard 42**

**Front:** What happens to the weighted sum after it is calculated in a neuron?

**Back:** Then

this weighted sum is converted by another function σ(z) into the output of the neuron.

**Flashcard 43**

**Front:** What is the function σ(z) called?

**Back:** The function σ(z) is called the “activation function”.

**Flashcard 44**

**Front:** Describe the basic operation of an artificial neuron.

**Back:** So

basically

**Flashcard 45**

**Front:** What is the Heaviside function and what is it based on?

**Back:** The activation function can be based on the threshold (“Heaviside” function): σ(z) = 0 if z ≤ threshold σ(z) = 1 if z > threshold where the threshold is a real number

a parameter of the neuron.

**Flashcard 46**

**Front:** What is a "perceptron"?

**Back:** A simple type of artificial neuron which takes one or several binary inputs (with values 0 or 1) and has a threshold-based activation function is called a “perceptron”.

**Flashcard 47**

**Front:** Are non-linear activation functions typically used in artificial neurons?

**Back:** They are usually non-linear functions (the reason for this will be explained later) so that an artificial neuron transforms its inputs by a linear weighted sum and a non-linear activation function.

**Flashcard 48**

**Front:** Define the sigmoid function.

**Back:** sigmoid(x) = 1 / (1 + e−x).

**Flashcard 49**

**Front:** What is the output range of the sigmoid function?

**Back:** It converts its input into an output in the range from 0 to 1.

**Flashcard 50**

**Front:** Define the hyperbolic tangent function.

**Back:** tanh(x) = (e2x − 1) / (e2x + 1).

**Flashcard 51**

**Front:** What is the output range of the hyperbolic tangent function?

**Back:** It converts the input into an output in the range from -1 to 1.

**Flashcard 52**

**Front:** Define the Rectified Linear Unit (ReLU) function.

**Back:** reLU(x) = max(0

x).

**Flashcard 53**

**Front:** Describe how the ReLU function transforms inputs.

**Back:** It is basically a linear transformation for inputs greater than zero

while inputs below zero are transformed to zero. The output range is from 0 to ∞.

**Flashcard 54**

**Front:** Define the Softmax function for multiple inputs xi.

**Back:** softmax(xi) = exi / ∑ exi.

**Flashcard 55**

**Front:** What is the output range of the Softmax function?

**Back:** Its output range is from 0 to 1.

**Flashcard 56**

**Front:** For what purpose is the Softmax function convenient in modelling?

**Back:** ...it is very convenient for modelling probabilities of different classes xi.

**Flashcard 57**

**Front:** Where are Softmax functions used in neural machine translation models?

**Back:** ...a softmax function taking into account all target words in order to decide which of them has the highest probability.

**Flashcard 58**

**Front:** What happens when many neurons are connected together?

**Back:** When many neurons are connected

these operations become a powerful tool.

**Flashcard 59**

**Front:** What is a feed-forward network sometimes called

and under what condition is this name argued to be appropriate?

**Back:** This type of network is sometimes called multilayer perceptron

although it is argued that the name should be used only if its neurons are actually perceptrons (neurons with a threshold activation function).

**Flashcard 60**

**Front:** What is the input layer in a neural network?

**Back:** The input layer consists of (one or more) input neurons. Inputs of this layer are inputs to the entire neural network. The input layer receives the inputs

performs the calculations in its neurons and transmits the output to the subsequent layer. Each neural network *must* have an input layer.

**Flashcard 61**

**Front:** What is the output layer in a neural network?

**Back:** The output layer consists of (one or more) output neurons. The output layer receives its input from the previous layer. Outputs of this layer repre-sent the outputs of the entire network. The output layer is responsible for producing the final result by performing calculations in its neurons. Each neural network *must* have an output layer.

**Flashcard 62**

**Front:** What is a hidden layer in a neural network?

**Back:** The hidden layer is in the middle and connects the input and output layer. The word “hidden” implies that they are not visible from outside the network.

**Flashcard 63**

**Front:** How many hidden layers can a neural network have?

**Back:** A neural network can have an arbitrary number of hidden layers

from zero to many.

**Flashcard 64**

**Front:** What is a "deep neural network"?

**Back:** If a neural network has more than one hidden layer

it is called a “deep neural network”.

**Flashcard 65**

**Front:** What is "deep learning"?

**Back:** If such a neural network (more than one hidden layer) is used for machine learning

it is called “deep learning”.

**Flashcard 66**

**Front:** What is learned by the first hidden layer in a deep neural network?

**Back:** In a multi-layer (“deep”) neural network

the first hidden layer is able to learn some relatively simple patterns.

**Flashcard 67**

**Front:** What is learned by each additional hidden layer in a deep neural network?

**Back:** ...each additional hidden layer is able to learn progressively more complicated patterns.

**Flashcard 68**

**Front:** What is a theoretical capability of neural networks according to the "Universal Approximation Theorem"?

**Back:** The “Universal Approximation Theorem” states that a neural network with one hidden layer can approximate any continu-ous function for inputs within a specific range.

**Flashcard 69**

**Front:** Are there strict rules for building neural networks?

**Back:** Knowing that there are no strict rules for building neural networks and there are many possibilities to arrange the neurons and define their func-tions

you should be better able to imagine that neural networks really can model practically any function.

**Flashcard 70**

**Front:** Is it necessary to use the same activation function for all neurons in a network?

**Back:** ...it is also not necessary to use the same activation function for all neurons in a network! Usually

all neurons in one layer have the same activa-tion function

**Flashcard 71**

**Front:** Why are the important activation functions mentioned in the text non-linear?

**Back:** If all neurons in a network have linear activation functions

no matter how many layers we have

**Flashcard 72**

**Front:** How do recurrent neural networks differ from feed-forward neural networks in terms of information flow?

**Back:** In recurrent neural networks

outputs of some neurons do not pass further to the neurons in the subsequent layer but return to the same neuron as its input.

**Flashcard 73**

**Front:** For a feed-forward network with one layer

how are the dependencies between layers formulated?

**Back:** * H = F (X)

meaning that the values in the hidden layer are a function of the values in the input layer. * Y = F (H)

**Flashcard 74**

**Front:** How are the dependencies defined for a recurrent neural network?

**Back:** Hn = F (Xn

Hn−1) where n refers to the current position (“time frame”) in a sequence. This means that the current values (at position n) in the hidden layer Hn are not dependent only on the current values of the input layer Xn (as in feed-forward networks)

**Flashcard 75**

**Front:** What is the structure of RNNs well-suited for modelling?

**Back:** The structure of RNNs is well suitable for mod-elling of sequences.

**Flashcard 76**

**Front:** What does the output of an RNN at a given time step/position depend on?

**Back:** In total

the output depends not only on the current input at the current time step/position Xt

**Flashcard 77**

**Front:** What is a prominent type of network architecture used in Natural Language Processing nowadays?

**Back:** Nowadays

almost everything in Natural Language Processing is based on so-called “attention networks” (which include the most modern transformer architecture).

**Flashcard 78**

**Front:** What do attention networks represent?

**Back:** They are complex networks which represent how different inputs relate to different outputs.

**Flashcard 79**

**Front:** Were neural networks used for machine translation mentioned as being simple?

**Back:** It should be noted that neural networks used for machine translation are very large and complex

involving a large number of neurons organised in many layers

**Flashcard 80**

**Front:** What is characteristic of a feed-forward neural network regarding the direction of the input signal?

**Back:** It is called “feed-forward” because the input signal is always going forward

from the input layer through the hidden layer(s) to the output layer.

Neural networks Flashcards

(401 cards)