Lecture 1 Flashcards

1
Q

Define artificial intelligence (AI)

A

Methods where a computer mimics human (or other animal) behaviour

The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define machine learning.

A

A subfield of artificial intelligence, using statistical models that let machines get better at tasks with experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define deep learning.

What are its features?

A

Machine learning using multilayer [deep] neural networks.

  • Highly flexible and non-linear
  • Capable of representing any functional mapping

A type of machine learning based on artificial neural networks in which multiple layers of processing are used to extract progressively higher level features from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different types of machine learning?

A

Supervised learning and unsupervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe supervised learning.

A

Training a model by showing it inputs and outputs. When you show it a new input, it will predict an output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe unsupervised learning.

A

Discovering patterns in the data that were not known before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two types of supervised learning?

A

Classification and regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe classification.

A

Given an input, we assign it to a particular class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe regression.

A

Given an input, we assign to it a number (or set of numbers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Briefly, how do we conduct machine learning?

A
  • Start off with a training set - for each piece of training data we know the right answer (ie what we want to predict)
  • Build a model for the process that generated the data
  • Use this model to make predictions about data we have not seen before
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of machine learning?

A
  • Face recognition
  • Text recognition
  • Voice assistance
  • Autonomous driving
  • Drug discovery
  • Quantum chemistry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regardless of the machine learning algorithm we are interested in or the objects we want to make predictions about, what kind of data do we need?

A

The object must be converted to numerical data.

Boils down to a set of equations / mathematical functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the data represented by?

A

A set of numbers, x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is each xi referred to as?

A

A feature. Therefore x is a feature vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In image recognition, what could the vectors represent?

A

We get a vector xi for each image i, containing the darkness of each pixel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we need to select features?

A

Not every piece of data we have relates to the target variable.

Some may not change at all with the target variable or some may be poor predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the step function?

A

H(x)

A function that increases or decreases abruptly from one constant value to another. Used in the case of classification to distinguish between two classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can we rewrite our mathematical learning problems?

A

Rewrite in terms of neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are neurons?

A

In machine learning, a “neuron” refers to the basic processing unit within an artificial neural network, essentially a mathematical model that receives input signals, performs calculations based on assigned weights, and produces an output signal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What do neurons do?

A

Take inputs, transform them and give output (m). We can apply a function H(m) to this, to decide whether we are in class 1 or 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a perceptron?

A

In machine learning, a “perceptron” is a simple model of a biological neuron, considered the most basic form of an artificial neural network. It is used for supervised learning of binary classification tasks, where it takes multiple weighted inputs and produces a single binary output (either 0 or 1) based on a linear decision boundary.

  • Perceptron takes inputs
  • These are combined to give the activation m
  • An activation function f(.) is applied to m
  • The output is z = f(m)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do we find the weights and bias of the activation m?

A
  • We have a training set with N data points
  • Each data point has an input xi and target output (ti)
  • We fix the weights of the perceptron by tuning them so that we get the right answer for our inputs

ie for the network, we are finding the best vector of weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For a network, we want to find the best vector of weights. What does this mean?

A

Mostly, this means the set of weights that results in the fewest wrong answers.

We start with some set of weights w(0) and improve them to get w(1), w(2)…. Each step changes the weights to decrease the error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do we get from w(i) to w(i+1)?

A

Decrease the errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When do we get errors?

A

When we misclassify.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the formula for the probability of a wrong answer?

A

The error
[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How many possibilities are there for each input?

A

Four

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the error, E represent?

A

The probability of getting a wrong answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If answers are always correct, what is the error?

A

E = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Why is it difficult to derive an algorithm for updating the weights?

A

The error is not a continuous function of the weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What algorithm do we use for updating the weights?

A

Perceptron learning algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What steps are followed to update the weights?

A

[See flashcard]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

In the formula for perceptron learning, what is v?

A

v is the learning rate, it controls how quickly the weights change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When are the weights updated?

A

Each time we select a training point

35
Q

What is batch learning?

A

When all training points are used in order.

36
Q

What is on-line learning?

A

At every update, choose a new training point at random.

37
Q

What is a popular example for the perceptron?

A

Logic gates. These perform some operation on a pair of binary inputs.

38
Q

What does a logic gate do?

A

Performs some operation on a pair of binary inputs.

39
Q

What are possibilities for the definition of the error function?

A

Mean squared error [see flashcard for formula]
Sum-of-squares error [see flashcard for formula]

40
Q

Which error function allows models with different number of training points to be compared?

A

Mean squared error.

41
Q

Which error function is often used in the literature?

A

Sum-of-squares error

42
Q

For classification problems, what error do we use?

A

The cross-entropy error
[See flashcard for formula]

43
Q

What is a linearly separable problem?

A

When we can draw a straight line in 2D separating two classes.

44
Q

If a problem is not linearly separable, what can you try to do?

A

Transform the problem so it is linear, by choosing better features.

Or problems may become solvable when we choose MORE features. eg using 3 features in 3D space instead of 2 features in 2D space.

45
Q

How does the perceptron make decisions?

A

Based on the activation, zi = f(m)

46
Q

How do we get a different type of decision boundary (not linear)?

A

The activation must be a nonlinear function of the input features.

This leads to the multilayer perceptron (neural network)

47
Q

What is an alternative to the sigmoid function that is nowadays quite popular?

A

Rectified linear unit (ReLU)

48
Q

What does overfitting indicate?

A

The model is too complex.

It may be fitted to the training data perfectly but is unable to generalise.

49
Q

How do multi-layer (deep) neural networks compare to the perceptron?

A

Highly flexible and non-linear

Multi-layer neural networks are capable of representing any functional mapping. They are universal approximators. Any model can be learned.

50
Q

When you are modelling with a perceptron, what set of weights do you start with?

A

Start with a random set of weights.

Idea is to then iterate and refine the model, using the data points to train the model to update the weights and bias such that the error is decreased.

For this reason, where you end up depends on where you started.

51
Q

When are the weights of the perceptron updated?

A

When the model makes an incorrect classification.

If classified correctly, we don’t need to adjust the weights. No change in the weights / boundary of the line means the model is classifying correctly.

52
Q

What are possible activation functions for neurons in a classification perceptron?

A
  • Step function
  • Sigmoid function
53
Q

What are the sigmoid function equations? What are the differences?

A

[See flashcard]
The version with e is bounded between by 0 and 1 making it suitable for classification.

54
Q

What is the difference between the sigmoid function and the step function?

A

The sigmoid function is smooth (continuous)

55
Q

What is the benefit of a sigmoid activation?

A

It is continuous - we can get gradient information to improve weight optimisation

56
Q

Describe which terms of the cross-entropy function are retained when ti = 1 or ti = 0.

A

[See flashcard]

57
Q

What is a feature of the cross-entropy function?

A

It is a highly nonlinear function of the weights.

Therefore we can’t differentiate it, we need to optimise the weights numerically ie iterate the error improvement.

58
Q

Describe what is the case if d/dE is large and positive?

A

We are near a maximum of the curve.
We need to decrease the weights to get closer to the minimum.

59
Q

Describe what is the case if d/dE is small and positive?

A

We need to decrease the weights by a smaller amount.

60
Q

What is the change in weights proportional to if we want to change a weight so that the error decreases?

A

[See flashcard]

61
Q

Why do we need to use the chain rule when finding the derivative of the error with respect to the weight for our perceptron?

A

The weights are not explicitly in the error function.

62
Q

What is the change in weights proportional to?

A

The features.

63
Q

If Zi < ti, what is true for the output of the network?

A

It needs to increase to get the correct answer.

64
Q

If Zi < ti, what value must the feature be to increase Zi?

A

If the value of the feature is positive, increasing the weight will increase Zi.

If the value of the feature is negative, decreasing the weight will increase Zi.

In this way, we have already learnt something about the network without doing any math.

65
Q

What kinds of problems are suitable to the perceptron?

A

Linearly separable

66
Q

Why are some problems not suitable for a perceptron?

A

They are not linearly separable ie they do not have linear decision boundaries.

67
Q

What does the perceptron make decisions based upon?

A

The activation z = f(m)

Using a straight line to make decisions - this activation is linear in the input features. This means the decision boundary is a straight line.

68
Q

How do we get a different type (non-linear) of decision boundary?

A

The activation must be a nonlinear function of the input features.

This leads to the multilayer perceptron (or neural network).

69
Q

Describe the architecture of a perceptron.

A

The perceptron is a one-layer network, which usually has a step function as the activation function.

70
Q

Describe the architecture of a multilayer perceptron (or NN)?

A

The information passes through more than one neuron before it gets to the output.

As well as input and output neurons (or nodes), there are hidden layers. These perform non-linear transformations on their inputs.

71
Q

What arrives at the output of a NN?

A

Non-linear in features.

72
Q

What is the advantage of non-linear functions?

A

They allow for more complexity.

73
Q

What is the advantage of neural networks?

A

They are very flexible. They can have lots of layers and lots of nodes, thereby having a large number of adjustable parameters (weights and bias parameters).

The architecture can be tuned.

They can represent arbitrarily complex decision boundaries, and do regression for arbitrarily complex functions.

74
Q

What notation should you use for neural networks?

A

Vector notation.

The vector of weights - each row represents the weights for an input. This is then used in matrix multiplication with the transposed x vector to give the activations.

75
Q

Why is the XOR problem suited for a neural network?

A

It is not linearly separable - cannot be modelled with a perceptron.

76
Q

In a NN with several layers, do all the activation functions have to be the same?

A

No, we can combine different activation functions.

77
Q

What might the final layer be in a NN for classification?

78
Q

What might the final layer be in a NN for regression?

A

Linear activation
Rectified linear unit (ReLU)

[See flashcard]

79
Q

In a node with ReLU activation function, what value is output if there is a negative activation?

80
Q

In a node with ReLU activation function, what value is output if there is a positive activation?

A

m, the value of the activation

81
Q

In terms of outputs, what is a difference between the perceptron and a neural network?

A

In neural networks, we can have more than one output.

The advantage is that this gives us another way of classifying - eg probabilities of different classes.

82
Q

What function do we apply to guarantee that the outputs add to 1 (and thus represent probabilities)?

A

The softmax function

83
Q

What is overfitting?

A

When the model is fit perfectly to the training data, but does not generalise well to unseen data.

We may expect the plot of an overfit model to be jagged and rough, passing through each individual point. This should be a smooth curve.

We want to have a trade off between complexity and goodness of fit.

84
Q

What is one way to overcome overfitting?

A

Splitting the data into training and testing sets.