MLPs Flashcards

1
Q

Who came up with NETtalk?

A

Sejnowski & Rosenberg, 1987

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does NETtalk do?

A

The task is to learn to pronounce English text from examples (text-to-speech)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the training data for NETtalk?

A

a list of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the inputs and outputs of NETtalk?

A

Input: 7 consecutive characters from written text presented in a
moving window that scans text

Output: phoneme code giving the pronunciation of the letter at the
center of the input window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the network topology of NETtalk

A

7x29 binary inputs (26 chars + punctuation marks),

80 hidden units and

26 output units (phoneme code).

Sigmoid units in hidden and output layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the activation function of the hidden and output layers of NETtalk?

A

Sigmoid units in hidden and output layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the accuracy of NETtalk after 50 epochs?

A

Training protocol: 95% accuracy on training set after 50 epochs
of training by full gradient descent. 78% accuracy on a test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give example of another MLP similar to NETtalk

A

DEC-talk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give 3 examples of MLPs

A

NETtalk : converts text to speech (English)

ALVINN: Autonomous Land Vehicle In a Neural Network

Falcon: Frauds detection in credit card transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is adding layers in multi-layer networks harmful?

A

– OVERFITTING: the increased number of weights quickly leads to data
overfitting (lack of generalization)

– SGD trap: huge number of (bad) local minima trap the gradient descent
algorithm

– Vanishing or exploding gradients (the update rule involves
products of many numbers) cause additional problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we need deeper layers?

A

With one hidden layer, the number of required nodes and weights grows exponentially fast

The deeper the network the fewer nodes are required to model “complicated” functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly