Week 1: Introduction & Overview Flashcards

Question 1

Q

What is a labelled dataset?

Answer

A

A dataset where we know the output y corresponding to each input x

Question 2

Q

What are input features?

Question 3

Q

What is the Euclidean distance?

Answer

A

Euclidean distance is a measure of the straight-line distance between two points in a p-dimensional space.

Question 4

Q

What is the loss function?

Question 5

Q

Why is the kNN with k = 1 more erratic than for k =3?

Answer

A

Because we view the output value y as a random variable prone to (modeling) noise. With k = 1 the prediction relies on only one data point and is thus more erratic.

Question 6

Q

Why is kNN a nonparametric method?

Answer

A

The prediction isn’t given by some function.

Question 7

Q

Why is k in kNN a hyperparameter?

Answer

A

Since k isn’t learned by the kNN algorithm itself, but rather chosen beforehand. (Regular parameter values are learned when training the model).

Question 8

Q

Will k = 1 in kNN most often lead to over- or underfitting?

Answer

A

Overfitting.

Question 9

Q

Why is it that increasing the hyperparameter k in kNN more often will generate better generalization beyond training data?

Answer

A

Because the predictions will be less sensitive to peculiarities of the training data and therefore less overfitted.

Question 10

Q

If k in kNN is sufficiently large, what will be the (negative) consequence of the predictions?

Answer

A

The (k) neighbourhood will include all training data points and the model will reduce to predicting the mean of the data for any new input.

Question 11

Q

What is a systematic way of choosing a good k in kNN?

Answer

A

To use cross-validation.

Question 12

Q

Why and when should we re-scale the input variables when using kNN?

Answer

A

Should be done if the intervals within which each training input x ranges are on very different scales. The input which ranges within, e.g., an interval of [1000, 1500] will contribute more to the sum of the Euclidean distance than an input ranging between e.g., [0,2].

Question 13

Q

(General) Should we perform re-scaling (normalisation) on the training and/or test data?

Answer

A

Normalisation should be perfomed on the training data only. Then, apply this same scaling to future test data points as well - never re-scale the full data set all at once.

Question 14

Q

What are the normal equations and how do we write them?

Answer

A

The NE are equations obtained by setting equal to zero the partial derivatives of the sum of squared errors. The solution to the equations give the LS estimates of the regression model coefficients (theta).

Question 15

Q

What is the key goal of supervised (machine) learning?

Answer

A

By using some training data where we have examples of how an input x is related to an output y, predict the output for NEW test data where only x is known, using some mathematical model.

Question 16

Q

Difference between supervised (our course) learning and unsupervised?

Answer

Study These Flashcards

A

Main distinction between the two approaches is the use of labeled datasets. Supervised learning uses labeled input and output data, while an unsupervised learning algorithm doesn’t.

In supervised learning, the algorithm “learns” from the training dataset by iteratively making predictions on the data and adjusting for the correct answer. While supervised learning models tend to be more accurate than unsupervised learning models, they require upfront human intervention to label the data appropriately.

Question 17

Q

Why must the kNN predictions must be piecewise constant?

Answer

Study These Flashcards

A

Because in kNN we assume that the regression function is well approximated by a locally constant function. The function is piecewise constant, with each piece corresponding to a region of the input space with a constant set of nearest neighbors (their difference does not matter.

Local Decision Boundaries: kNN makes predictions based on the majority class or average value of the k-nearest neighbors of a data point. The decision boundary is determined by the nearest neighbors. When you use a small value of k (e.g., k=1), the decision boundary can be highly sensitive to small variations in the data, leading to piecewise constant predictions. In other words, small changes in the input can result in sudden changes in the predicted class or value.