Neural networks Flashcards

1
Q

Here are some flashcards with questions strictly from the sources:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Flashcard 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Front: What topics were covered in Week 1 to Week 11?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Back: Week 1 covered “Neural Language Modelling”. Week 2 covered “Neural machine Translation & Transformers”. Week 3 covered “Multi-lingual Machine Translation”. Week 4 covered “Low resource & Multi-modal machine translation”. Week 5 covered “Overhype versus reality: When to use machine translation…and when not to”. Week 6 is “Overview”. Weeks 7

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Flashcard 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Front: On what level did the translation model have to be defined according to the text?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Back: The model has to be defined on the word level instead of the sentence level.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Flashcard 3

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Front: What is introduced into the translation model as a “hidden variable”?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Back: Underlying connection between source and target words is introduced into the translation model as a so-called “hidden variable”.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Flashcard 4

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Front: What is a hidden variable?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Back: A hidden variable is a variable which has an influence on the model but is not actually seen.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Flashcard 5

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Front: What does ‘a’ represent in the context of translation probability?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Back: ‘a’ represents alignment “sentence” A sequence of alignments (positions) for each source word.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Flashcard 6

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Front: Provide an example of translation probability with word alignment given in the source.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Back: “= I like red bicycles = me gustan bicicletas rojas = 1 2 4 3”.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Flashcard 7

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Front: How many target words can source words be connected to in one type of alignment?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Back: Source words can be connected to exactly one target word.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Flashcard 8

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Front: What is the term for a source word without connections in alignment?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Back: A source word without connections is called a spurious word.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Flashcard 9

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Front: What is “zero fertility” in word alignment referring to?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Back: “Zero fertility” refers to a word not translated.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Flashcard 10

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Front: What do phrases in the context of SMT allow?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Back: Phrases allow translation from a word group to a word group.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Flashcard 11

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Front: What is a limitation of word-based translation models compared to phrase-based models?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Back: Word-based translation models only allow translation from a single word into a word group.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Flashcard 12

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Front: What are some advantages of using phrases over words in translation?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Back: Longer context can generally be captured

A

and there is better handling of idioms and other multi-word expressions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Flashcard 13

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Front: What constitutes an inconsistent phrase pair according to the example?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Back: Middle: Inconsistent phrase pair.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Flashcard 14

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Front: What is the goal of decoding in the context discussed?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Back: Decoding aims to find the best hypothesis.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Flashcard 15

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Front: What type of data was mentioned in relation to the neural network forward pass in the lecture overview?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Back: Complex “unstructured” data.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Flashcard 16

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Front: What will be covered in a later part of the lecture regarding neural networks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

Back: Neural network forward pass with images and Backpropagation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Flashcard 17

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

Front: What is naive text input for a neural network?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

Back: Naive text input for neural network.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

Flashcard 18

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

Front: What is a common non-linear function used in neural networks after getting a value for each node?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Back: One of the most common is called a Rectifier Linear Unit (ReLU).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

Flashcard 19

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

Front: What is the mathematical definition of the ReLU function?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

Back: f(x) = max(0

A

x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

Flashcard 20

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

Front: What happens to the value at a node if it is less than zero when using the ReLU function?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q

Back: if the value at a node is less than zero

A

we f(x) = max(0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

Flashcard 21

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

Front: What is the significance of the values 0.875

A

0.004

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Back: These values sum to 1.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

Flashcard 22

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

Front: What is involved in the backward pass of a neural network?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

Back: Inputs

A

outputs (o)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
113
Q

Flashcard 23

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
114
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
115
Q

Front: What is a common loss function mentioned in the context of neural networks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
116
Q

Back: Cross-entropy is a common loss function.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
117
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
118
Q

Flashcard 24

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
119
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
120
Q

Front: Is cross-entropy the only loss function?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
121
Q

Back: No

A

*This is not the only loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
122
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
123
Q

Flashcard 25

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
124
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
125
Q

Front: What is the gradient calculated with respect to in the context of neural networks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
126
Q

Back: Gradient with respect to the weights of the network.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
127
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
128
Q

Flashcard 26

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
129
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
130
Q

Front: What are some terms associated with calculating the gradient?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
131
Q

Back: partial derivative and learning rate.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
132
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
133
Q

Flashcard 27

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
134
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
135
Q

Front: What are the sets used in training and evaluating a model mentioned in the lecture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
136
Q

Back: Dataset Train model and Dataset Test set.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
137
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
138
Q

Flashcard 28

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
139
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
140
Q

Front: What type of error surfaces do neural networks have?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
141
Q

Back: Neural networks have non-convex error surfaces.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
142
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
143
Q

Flashcard 29

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
144
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
145
Q

Front: What is a consequence of neural networks having non-convex error surfaces in terms of finding minima?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
146
Q

Back: Neural networks have non-convex error surfaces (no global minima). We want to get a good local minimum.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
147
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
148
Q

Flashcard 30

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
149
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
150
Q

Front: What are some methods of gradient descent mentioned in the lecture?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
151
Q

Back: Stochastic gradient descent

A

batch gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
152
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
153
Q

Flashcard 31

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
154
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
155
Q

Front: What is “one-hot encoding” used for in the context of neural networks?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
156
Q

Back: Word representation (naïve).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
157
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
158
Q

Flashcard 32

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
159
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
160
Q

Front: What is a problem with using “one-hot encoding” for word representation?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
161
Q

Back: This is sparse! (lots of zeros) Will get even more sparse as the vocabulary grows!.

163
Q

Flashcard 33

165
Q

Front: What are artificial neural networks inspired by?

166
Q

Back: Artificial neural networks (or simply neural networks)

A

although inspired by the neurons in the human brain.

168
Q

Flashcard 34

170
Q

Front: What are neural networks essentially from a mathematical perspective?

171
Q

Back: …nothing more than a bunch of mathematical functions involving a large number of matrix multiplications.

173
Q

Flashcard 35

175
Q

Front: What is a key capability of neural networks?

176
Q

Back: The power of neural networks (NNs) lies in their ability to create complex mappings (functions) between their inputs and outputs.

178
Q

Flashcard 36

180
Q

Front: Why are derivatives of functions important for neural networks?

181
Q

Back: …as they measure the sensitivity to change of the function output value with respect to a change of its input value. This is very important for training neural net-works.

183
Q

Flashcard 37

185
Q

Front: What does an artificial neuron do with its inputs?

186
Q

Back: An artificial neuron takes several inputs

A

for example three

188
Q

Flashcard 38

190
Q

Front: What are weights in an artificial neuron?

191
Q

Back: w1

A

w2 and w3 are weights

193
Q

Flashcard 39

195
Q

Front: What is the role of the function z(x) in an artificial neuron?

196
Q

Back: First

A

a function z(x) takes all the inputs and converts them into a weighted sum: z(x) = w1x1 + w2x2 + w3x3 + b.

198
Q

Flashcard 40

200
Q

Front: What is ‘b’ in the weighted sum of an artificial neuron?

201
Q

Back: b represents the inclusion of “bias” in each neuron in order to avoid that the weighted sum of the inputs becomes equal to 0. Bias gives the network something to work with in case that all input values are 0.

203
Q

Flashcard 41

205
Q

Front: What is the general formula for the weighted sum in a neuron?

206
Q

Back: z(x) = ∑ i xiwi + b.

208
Q

Flashcard 42

210
Q

Front: What happens to the weighted sum after it is calculated in a neuron?

211
Q

Back: Then

A

this weighted sum is converted by another function σ(z) into the output of the neuron.

213
Q

Flashcard 43

215
Q

Front: What is the function σ(z) called?

216
Q

Back: The function σ(z) is called the “activation function”.

218
Q

Flashcard 44

220
Q

Front: Describe the basic operation of an artificial neuron.

221
Q

Back: So

223
Q

Flashcard 45

225
Q

Front: What is the Heaviside function and what is it based on?

226
Q

Back: The activation function can be based on the threshold (“Heaviside” function): σ(z) = 0 if z ≤ threshold σ(z) = 1 if z > threshold where the threshold is a real number

A

a parameter of the neuron.

228
Q

Flashcard 46

230
Q

Front: What is a “perceptron”?

231
Q

Back: A simple type of artificial neuron which takes one or several binary inputs (with values 0 or 1) and has a threshold-based activation function is called a “perceptron”.

233
Q

Flashcard 47

235
Q

Front: Are non-linear activation functions typically used in artificial neurons?

236
Q

Back: They are usually non-linear functions (the reason for this will be explained later) so that an artificial neuron transforms its inputs by a linear weighted sum and a non-linear activation function.

238
Q

Flashcard 48

240
Q

Front: Define the sigmoid function.

241
Q

Back: sigmoid(x) = 1 / (1 + e−x).

243
Q

Flashcard 49

245
Q

Front: What is the output range of the sigmoid function?

246
Q

Back: It converts its input into an output in the range from 0 to 1.

248
Q

Flashcard 50

250
Q

Front: Define the hyperbolic tangent function.

251
Q

Back: tanh(x) = (e2x − 1) / (e2x + 1).

253
Q

Flashcard 51

255
Q

Front: What is the output range of the hyperbolic tangent function?

256
Q

Back: It converts the input into an output in the range from -1 to 1.

258
Q

Flashcard 52

260
Q

Front: Define the Rectified Linear Unit (ReLU) function.

261
Q

Back: reLU(x) = max(0

263
Q

Flashcard 53

265
Q

Front: Describe how the ReLU function transforms inputs.

266
Q

Back: It is basically a linear transformation for inputs greater than zero

A

while inputs below zero are transformed to zero. The output range is from 0 to ∞.

268
Q

Flashcard 54

270
Q

Front: Define the Softmax function for multiple inputs xi.

271
Q

Back: softmax(xi) = exi / ∑ exi.

273
Q

Flashcard 55

275
Q

Front: What is the output range of the Softmax function?

276
Q

Back: Its output range is from 0 to 1.

278
Q

Flashcard 56

280
Q

Front: For what purpose is the Softmax function convenient in modelling?

281
Q

Back: …it is very convenient for modelling probabilities of different classes xi.

283
Q

Flashcard 57

285
Q

Front: Where are Softmax functions used in neural machine translation models?

286
Q

Back: …a softmax function taking into account all target words in order to decide which of them has the highest probability.

288
Q

Flashcard 58

290
Q

Front: What happens when many neurons are connected together?

291
Q

Back: When many neurons are connected

A

these operations become a powerful tool.

293
Q

Flashcard 59

295
Q

Front: What is a feed-forward network sometimes called

A

and under what condition is this name argued to be appropriate?

296
Q

Back: This type of network is sometimes called multilayer perceptron

A

although it is argued that the name should be used only if its neurons are actually perceptrons (neurons with a threshold activation function).

298
Q

Flashcard 60

300
Q

Front: What is the input layer in a neural network?

301
Q

Back: The input layer consists of (one or more) input neurons. Inputs of this layer are inputs to the entire neural network. The input layer receives the inputs

A

performs the calculations in its neurons and transmits the output to the subsequent layer. Each neural network must have an input layer.

303
Q

Flashcard 61

305
Q

Front: What is the output layer in a neural network?

306
Q

Back: The output layer consists of (one or more) output neurons. The output layer receives its input from the previous layer. Outputs of this layer repre-sent the outputs of the entire network. The output layer is responsible for producing the final result by performing calculations in its neurons. Each neural network must have an output layer.

308
Q

Flashcard 62

310
Q

Front: What is a hidden layer in a neural network?

311
Q

Back: The hidden layer is in the middle and connects the input and output layer. The word “hidden” implies that they are not visible from outside the network.

313
Q

Flashcard 63

315
Q

Front: How many hidden layers can a neural network have?

316
Q

Back: A neural network can have an arbitrary number of hidden layers

A

from zero to many.

318
Q

Flashcard 64

320
Q

Front: What is a “deep neural network”?

321
Q

Back: If a neural network has more than one hidden layer

A

it is called a “deep neural network”.

323
Q

Flashcard 65

325
Q

Front: What is “deep learning”?

326
Q

Back: If such a neural network (more than one hidden layer) is used for machine learning

A

it is called “deep learning”.

328
Q

Flashcard 66

330
Q

Front: What is learned by the first hidden layer in a deep neural network?

331
Q

Back: In a multi-layer (“deep”) neural network

A

the first hidden layer is able to learn some relatively simple patterns.

333
Q

Flashcard 67

335
Q

Front: What is learned by each additional hidden layer in a deep neural network?

336
Q

Back: …each additional hidden layer is able to learn progressively more complicated patterns.

338
Q

Flashcard 68

340
Q

Front: What is a theoretical capability of neural networks according to the “Universal Approximation Theorem”?

341
Q

Back: The “Universal Approximation Theorem” states that a neural network with one hidden layer can approximate any continu-ous function for inputs within a specific range.

343
Q

Flashcard 69

345
Q

Front: Are there strict rules for building neural networks?

346
Q

Back: Knowing that there are no strict rules for building neural networks and there are many possibilities to arrange the neurons and define their func-tions

A

you should be better able to imagine that neural networks really can model practically any function.

348
Q

Flashcard 70

350
Q

Front: Is it necessary to use the same activation function for all neurons in a network?

351
Q

Back: …it is also not necessary to use the same activation function for all neurons in a network! Usually

A

all neurons in one layer have the same activa-tion function

353
Q

Flashcard 71

355
Q

Front: Why are the important activation functions mentioned in the text non-linear?

356
Q

Back: If all neurons in a network have linear activation functions

A

no matter how many layers we have

358
Q

Flashcard 72

360
Q

Front: How do recurrent neural networks differ from feed-forward neural networks in terms of information flow?

361
Q

Back: In recurrent neural networks

A

outputs of some neurons do not pass further to the neurons in the subsequent layer but return to the same neuron as its input.

363
Q

Flashcard 73

365
Q

Front: For a feed-forward network with one layer

A

how are the dependencies between layers formulated?

366
Q

Back: * H = F (X)

A

meaning that the values in the hidden layer are a function of the values in the input layer. * Y = F (H)

368
Q

Flashcard 74

370
Q

Front: How are the dependencies defined for a recurrent neural network?

371
Q

Back: Hn = F (Xn

A

Hn−1) where n refers to the current position (“time frame”) in a sequence. This means that the current values (at position n) in the hidden layer Hn are not dependent only on the current values of the input layer Xn (as in feed-forward networks)

373
Q

Flashcard 75

375
Q

Front: What is the structure of RNNs well-suited for modelling?

376
Q

Back: The structure of RNNs is well suitable for mod-elling of sequences.

378
Q

Flashcard 76

380
Q

Front: What does the output of an RNN at a given time step/position depend on?

381
Q

Back: In total

A

the output depends not only on the current input at the current time step/position Xt

383
Q

Flashcard 77

385
Q

Front: What is a prominent type of network architecture used in Natural Language Processing nowadays?

386
Q

Back: Nowadays

A

almost everything in Natural Language Processing is based on so-called “attention networks” (which include the most modern transformer architecture).

388
Q

Flashcard 78

390
Q

Front: What do attention networks represent?

391
Q

Back: They are complex networks which represent how different inputs relate to different outputs.

393
Q

Flashcard 79

395
Q

Front: Were neural networks used for machine translation mentioned as being simple?

396
Q

Back: It should be noted that neural networks used for machine translation are very large and complex

A

involving a large number of neurons organised in many layers

398
Q

Flashcard 80

400
Q

Front: What is characteristic of a feed-forward neural network regarding the direction of the input signal?

401
Q

Back: It is called “feed-forward” because the input signal is always going forward

A

from the input layer through the hidden layer(s) to the output layer.