Week 8 DSE Flashcards

Question

What is the goal of ML from predictive modelling?

Answer 1

“learn” f(X) from data in a way that yields good out-of-sample forecasting performance or get a forecasting rule that minimizes expected loss get a good estimate of E(Y|X)

Answer 2

optimal conditional point forecast Yˆ( X ) is the function of the conditional predictive distribution F(y|x) which minimizes the risk (expected loss).

Answer 3

conditional mean

Answer 4

flexible nonlinear functions (kernels, series). (for estimating the structure of data)

Answer 5

signal extraction (bias) overfitting (variance).

Answer 6

must know true F(y|x), so it is unattainable. estimate F(y|x) and classify based on the highest estimated conditional probability.

Answer 7

Parametric: “fixed” model, number of parameters fixed, faster computation, but stronger assumption on F(y|x) Nonparametric: “flexible” model, number of parameters may grow with available data, computation becomes harder (or untractable) with larger datasets.

Answer 8

nonparametric (lazy /instance based learner)

Answer 9

Find the 3 closest points of the datapoint to the pooint to classify calculate euclidean distance and place them in the set for the k nearest neighbours estimate conditional probabilities classify test observation to class with highest prob (FIND K MOST SIMILAR AND LET THEM VOTE)

Answer 10

look at the entire data

Answer 11

slow (need to ssort through data every time to determine nearest points)

Answer 12

SCALING (esp KNN)

Answer 13

typically uses Euclidean distance, so we would get a very different answer depending on the scaling of X (e.g., income in SGD vs. in thousands of SGD). S Standardiize or scale to [0,1]

Answer 14

CATEGORICAL PREDICTORS assign zero if classes coincide with the test observation, and 1 if not.

Answer 15

assign zero if classes coincide with the test observation, and 1 if not. (1 can be replaced by a higher number for some mismatches based on domain knowledge.)

Answer 16

readily extends to predicting continuous variables.

Answer 17

OUT sample MSE

Answer 18

using data not used in estimation to compute out of sample risk estimate 1. Training sample: data used to estimated prediction rule(s). 2. Validation sample: data used to test estimated rule(s) on NEW data (a.k.a. hold-out sample).

Answer 19

FALSE any function that passes through all the datapoints would set it to 0 likely result in terrible out-of-sample performance – such model would be overfit to the training data and will generalize poorly, since we chase noise specific to the sample.

Answer 20

introduce some penalty for complexity – this is referred to as regularization. In this example, regularization comes in through the choice of K.

Answer 21

bias variance

Answer 22

By conditioning on available information we can make the forecasts more accurate. Conditioning reduces the risk of the forecast. Ignoring estimation error considerations, conditioning on more information is always better in the sense of reducing risk.

Answer 23

Under absolute loss, the optimal point forecast is the median. Other loss functions lead to different solutions. Working out optimal forecasts under nonstandard losses could be tricky. Sometimes we do not have an explicit loss function, so we take the simplest approach and use the quadratic loss. In some real-world applications (e.g., policymaking) a very explicit loss function may arise. In that case, it would be best to use the estimators and forecasts tailored to the loss.

Answer 24

Get a good estimate of E(Y|X)

Answer 25

traditional parametrics (fit flexible non linear function) (cause most econometric model assume e(Y|X) approx linear at least in parmaeters while trading off signal extraction(bias) with overfitting(variance)

Answer 26

parametric and non parametric

Answer 27

Problem: must known true F(y|x) so it is unattainable Solution: estimate it and classify based on highest estimated CONDITIONAL PROBABILITY

Answer 28

doesn’t produce any model and hence any “understanding” of how X relates to Y: just lets the K most similar training data points “vote” on the class of the test observation.

Answer 29

Euclidean distance doesn’t make much sense when applied to categorical predictors. If assign number to different categories, the error might be differnet depending on which categoresi are assigned to which number, although all should be same error Solution: 0-1 loss

Answer 30

Yes depending on domain knowledge

Answer 31

penalty for complexity, To restrict choice available f(X). Comes in through choice of k

Answer 32

very stable from sample to sample: low var misses many X values: high bias

Week 8 DSE Flashcards

(57 cards)