SECTION 3: Learning Flashcards

1
Q

Describe, in simple terms, what Supervised Learning is. (2p)

A

Supervised learning is when you know what you want the target to be - you know what you want your network to do. You provide your network with input data as well as desired output data.

The goal in supervised learning is to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in data. Which then can be applied to new data.

Supervised learning is typically done in the context of classification, when we want to map input to output labels, or regression, when we want to map input to a continuous outputs. The goal, in both these contexts, is to find specific relationships or structure in the input data that allow us to effectively produce correct output data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the difference between Supervised and Unsupervised Learning. (2p)

A

The main difference between supervised and unsupervised learning is that in supervised learning we have some kind of ground truth, i.e. we have prior knowledge of what the output values for our samples should be. Unsupervised learning, on the other hand, does not have labeled outputs, so its goal is to infer the natural structure present within a set of data points.

Clustering is on of the most common tasks within unsupervised learning are clustering, where we wish to learn the inherent structure of our data without using explicitly-provided labels. Some common algorithms include for instance autoencoders. This means that unsupervised learning is very useful, e.g. in exploratory analysis, because it can automatically identify structure in data and can help us learn relationships between individual features within a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Hebbian Learning Rule (as an unsupervised learning example)? (1p)

A

The hebbian learning rule states the famous principle that neurons that fire together wire together. This means that the more frequent two neurons are activated simultaneously, the stronger get their (inter-)connection. This rule applies to unsupervised learning algorithms and is based on that you don’t have a stated/desired target. The neurons only update their activity according to the patterns it finds within the dataset.

So for example, if the algorithm finds that within a dataset, say a grocery shopping list, that the purchase of oranges and apples often are bought together, it will register this pattern and update the weights between the orange-neuron and the apple-neuron (very simplified) and the interconnections between these neurons will get stronger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Perceptron Learning Rule (as a supervised learning example)? (1p)

A

The perceptron learning rule, compared to hebbian learning, is updated according to the error of wanted and activated output.

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a set of input, output pairs. The goal is to reduce the error e, which is the difference between the neuron response a, and the target vector t. The perceptron learning rule calculates desired changes to the perceptron’s weights and biases given an input vector p, and the associated error e. The target vector t must contain values of either 0 or 1, as perceptrons can only output such values. For each time one run is executed, the perceptron has a better chance of producing the correct outputs. The perceptron rule is proven to converge on a solution in a finite number of iterations if a solution exists.
The perceptron learning rule is very simple and can be stated as follows:

  1. Start with random weights for the connections
  2. Select an input vector x from the set of training samples
  3. If y ≠ d(x) , i.e. the perceptron gives an incorrect response, then modify all connections wi according to Δwi = d(x) χi
  4. Go back to step 2.
    (This means that when the network responds correctly no connection weights are modified)

So for example, if we identify a certain pattern which we would like the algorithm to learn, we would start off with randomized weights, and carefully, little by little adjust the weights according to the calculated error e for each run. Eventually we would end up with the optimal weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When training a multi-layer perceptron by gradient descent describe the problem of local minima. (2p)

A

When training an MLP with gradient descent we want to find the global minima = the point with lowest error. But it might get stuck, in a so called local optima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A. What is the credit assignment problem for multi-layer perceptrons?

B. Describe in detail (don’t necessarily have to use equations) how the credit assignment problem is dealt with giving reference to the backward pass of the backpropagation algorithm.

C. What problems can be encountered with MLPs with multiple hidden layers using standard back propagation with gradient descent, why? (6p)

A

A. Since you have (a) hidden layer(s) it’s more difficult to decide which weight is responsible for the error. Is it the weight from hidden -1 output or further back in the network?

B. The backpropagation partly fix this. By using the backpropagation we can calculate how much a certain node affects the output, and can update it according to how much relevance it has. For example, if a node’s weight is 0, we know that that node didn’t affect the output and therefor didn’t affect the error. So we don’t need to update that weight.

C. One problem is the problem of vanishing gradient. The weight closer to the output layer will be updated more than ones further away from the output. If you only have a few hidden layer this is not that much of a problem, but if it’s a big network the first layers might only be updated very, very little, even though they might have a big responsibility for the error. This might be a problem with for example facial recognition, where the first layers calculate the most basic features. If those weight are wrong then the features that are more complex will be built on faulty simple features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly