Week 1: Introduction & Overview Flashcards

1
Q

What is a labelled dataset?

A

A dataset where we know the output y corresponding to each input x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are input features?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Euclidean distance?

A

Euclidean distance is a measure of the straight-line distance between two points in a p-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the loss function?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is the kNN with k = 1 more erratic than for k =3?

A

Because we view the output value y as a random variable prone to (modeling) noise. With k = 1 the prediction relies on only one data point and is thus more erratic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is kNN a nonparametric method?

A

The prediction isn’t given by some function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is k in kNN a hyperparameter?

A

Since k isn’t learned by the kNN algorithm itself, but rather chosen beforehand. (Regular parameter values are learned when training the model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Will k = 1 in kNN most often lead to over- or underfitting?

A

Overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is it that increasing the hyperparameter k in kNN more often will generate better generalization beyond training data?

A

Because the predictions will be less sensitive to peculiarities of the training data and therefore less overfitted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If k in kNN is sufficiently large, what will be the (negative) consequence of the predictions?

A

The (k) neighbourhood will include all training data points and the model will reduce to predicting the mean of the data for any new input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a systematic way of choosing a good k in kNN?

A

To use cross-validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why and when should we re-scale the input variables when using kNN?

A

Should be done if the intervals within which each training input x ranges are on very different scales. The input which ranges within, e.g., an interval of [1000, 1500] will contribute more to the sum of the Euclidean distance than an input ranging between e.g., [0,2].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(General) Should we perform re-scaling (normalisation) on the training and/or test data?

A

Normalisation should be perfomed on the training data only. Then, apply this same scaling to future test data points as well - never re-scale the full data set all at once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the normal equations and how do we write them?

A

The NE are equations obtained by setting equal to zero the partial derivatives of the sum of squared errors. The solution to the equations give the LS estimates of the regression model coefficients (theta).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the key goal of supervised (machine) learning?

A

By using some training data where we have examples of how an input x is related to an output y, predict the output for NEW test data where only x is known, using some mathematical model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Difference between supervised (our course) learning and unsupervised?

A

Main distinction between the two approaches is the use of labeled datasets. Supervised learning uses labeled input and output data, while an unsupervised learning algorithm doesn’t.

In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer. While supervised learning models tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately.

17
Q

Why must the kNN predictions must be piecewise constant?

A

Because in kNN we assume that the regression function is well approximated by a locally constant function. The function is piecewise constant, with each piece corresponding to a region of the input space with a constant set of nearest neighbors (their difference does not matter.

Local Decision Boundaries: kNN makes predictions based on the majority class or average value of the k-nearest neighbors of a data point. The decision boundary is determined by the nearest neighbors. When you use a small value of k (e.g., k=1), the decision boundary can be highly sensitive to small variations in the data, leading to piecewise constant predictions. In other words, small changes in the input can result in sudden changes in the predicted class or value.

18
Q

What happens if we pick too large k?

A

We might experience processing issues, (for linear regression) prediction will reduce to average. Won’t be able to capture potential signal in data.

19
Q

What happens if we pick too small k?

A

The data may be too noisy. Too sensitive towards outliers.

20
Q

How do we standardize training and test data set respectively?

A

1) Standardize each observation in the training data set ONLY. 2) Use the same, training set scaling operator on the observations of the test data.