Lecture 1 Flashcards

Question

When do we get errors?

Answer 1

When we misclassify.

Answer 2

The error [See flashcard]

Answer 3

The probability of getting a wrong answer

Answer 4

The error is not a continuous function of the weights.

Answer 5

Perceptron learning algorithm.

Answer 6

[See flashcard]

Answer 7

v is the learning rate, it controls how quickly the weights change

Answer 8

Each time we select a training point

Answer 9

When all training points are used in order.

Answer 10

At every update, choose a new training point at random.

Answer 11

Logic gates. These perform some operation on a pair of binary inputs.

Answer 12

Performs some operation on a pair of binary inputs.

Answer 13

Mean squared error [see flashcard for formula] Sum-of-squares error [see flashcard for formula]

Answer 14

Mean squared error.

Answer 15

Sum-of-squares error

Answer 16

The cross-entropy error [See flashcard for formula]

Answer 17

When we can draw a straight line in 2D separating two classes.

Answer 18

Transform the problem so it is linear, by choosing better features. Or problems may become solvable when we choose MORE features. eg using 3 features in 3D space instead of 2 features in 2D space.

Answer 19

Based on the activation, zi = f(m)

Answer 20

The activation must be a nonlinear function of the input features. This leads to the multilayer perceptron (neural network)

Answer 21

Rectified linear unit (ReLU)

Answer 22

The model is too complex. It may be fitted to the training data perfectly but is unable to generalise.

Answer 23

Highly flexible and non-linear Multi-layer neural networks are capable of representing any functional mapping. They are universal approximators. Any model can be learned.

Answer 24

Start with a random set of weights. Idea is to then iterate and refine the model, using the data points to train the model to update the weights and bias such that the error is decreased. For this reason, where you end up depends on where you started.

Answer 25

When the model makes an incorrect classification. If classified correctly, we don't need to adjust the weights. No change in the weights / boundary of the line means the model is classifying correctly.

Answer 26

- Step function - Sigmoid function

Answer 27

[See flashcard] The version with e is bounded between by 0 and 1 making it suitable for classification.

Answer 28

The sigmoid function is smooth (continuous)

Answer 29

It is continuous - we can get gradient information to improve weight optimisation

Answer 30

[See flashcard]

Answer 31

It is a highly nonlinear function of the weights. Therefore we can't differentiate it, we need to optimise the weights numerically ie iterate the error improvement.

Answer 32

We are near a maximum of the curve. We need to decrease the weights to get closer to the minimum.

Answer 33

We need to decrease the weights by a smaller amount.

Answer 34

[See flashcard]

Answer 35

The weights are not explicitly in the error function.

Answer 36

The features.

Answer 37

It needs to increase to get the correct answer.

Answer 38

If the value of the feature is positive, increasing the weight will increase Zi. If the value of the feature is negative, decreasing the weight will increase Zi. In this way, we have already learnt something about the network without doing any math.

Answer 39

Linearly separable

Answer 40

They are not linearly separable ie they do not have linear decision boundaries.

Answer 41

The activation z = f(m) Using a straight line to make decisions - this activation is linear in the input features. This means the decision boundary is a straight line.

Answer 42

The activation must be a nonlinear function of the input features. This leads to the multilayer perceptron (or neural network).

Answer 43

The perceptron is a one-layer network, which usually has a step function as the activation function.

Answer 44

The information passes through more than one neuron before it gets to the output. As well as input and output neurons (or nodes), there are hidden layers. These perform non-linear transformations on their inputs.

Answer 45

Non-linear in features.

Answer 46

They allow for more complexity.

Answer 47

They are very flexible. They can have lots of layers and lots of nodes, thereby having a large number of adjustable parameters (weights and bias parameters). The architecture can be tuned. They can represent arbitrarily complex decision boundaries, and do regression for arbitrarily complex functions.

Answer 48

Vector notation. The vector of weights - each row represents the weights for an input. This is then used in matrix multiplication with the transposed x vector to give the activations.

Answer 49

It is not linearly separable - cannot be modelled with a perceptron.

Answer 50

No, we can combine different activation functions.

Answer 51

Linear activation Rectified linear unit (ReLU) [See flashcard]

Answer 52

m, the value of the activation

Answer 53

In neural networks, we can have more than one output. The advantage is that this gives us another way of classifying - eg probabilities of different classes.

Answer 54

The softmax function

Answer 55

When the model is fit perfectly to the training data, but does not generalise well to unseen data. We may expect the plot of an overfit model to be jagged and rough, passing through each individual point. This should be a smooth curve. We want to have a trade off between complexity and goodness of fit.

Answer 56

Splitting the data into training and testing sets.

Lecture 1 Flashcards

(84 cards)