1. Statistical Learning Flashcards

1
Q

What is the difference between supervised and unsupervised learning

A

Supervised: has response variable

Unsupervised: analyzes the observations or the variables without a response variable. Main idea is to identify patterns that may exist in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a parametric method and a non-parametric method?

A

Parametric: specifies a functional form for f that includes free parameters (parameters that we estimate).

Non-parametric: makes no assumption about f’s functional form, f is then mainly algorithmic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main objectives to supervised learning?

A

Inference and prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A methods predictive strength coincides with its _______

A

Flexibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does one’s ability to make inferences depend on?

A

The interpretability of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are flexibility and interpretability inversely related?

A

Because if a model is very flexibly (fits the data too well), then it is likely that the model is complicated (not easily interpreted)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Methods that are less flexible, but more interpretable?

A

Lasso and subset selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Methods that are moderately flexible and interpretable?

A

Least squares
Regression trees
Classification trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Methods that are very flexible, but not interpretable?

A

Bagging

Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do flexibility and predictive accuracy go hand in hand? Why or why not

A

They do not. When a method is highly flexible, that means that it is flexible on the training data, not the test data.

Highly flexible = perfect predictions on past data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the bias of a model speak to?

A

The bias relates to the average closeness between f-hat and f.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between prediction and inference?

A

Prediction: output of f-hat
Inference: comprehension of f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In KNN regression, which of the following are true as k increases?
A. Flexibility increases
B. Squared bias increases
C. Variance decreases

A

As k increases, the model becomes less flexible (worse at predicting)
A. False
B. True
C. True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rank these three in terms of flexibility, in decreasing order.
Linear regression
Ridge regression
Regression tree

A

Most flexible: regression tree
Linear regression
Least flexible: Ridge regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rank in decreasing order of flexibility.
Linear regression
Lasso regression
Boosting

A

Boosting
Regression
Lasso

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
Which of these modelling techniques perform variable selection?
Lasso 
Partial least squares
PCA
ridge
A

Only lasso

PCA and partial least squares: both use all variables in determining the partial least squares directions and the PCs

17
Q

Is unsupervised learning used to draw inferences from datasets withoIt a specified response variable?

A

Yes

18
Q

Does the accuracy of a prediction for Y depend on the irreducible and reducible error?

A

Yes

19
Q

Correlation and covariance formula. What are the boundaries for each of these values?

A

Formula

20
Q

Which statements are true regarding scatter plots?
A. If it shows a quadratic relationship, the variables’ sample correlation will be around 0.
B. They are not ideal for detecting non-linear relationships between two variables

A

Both false.

The quadratic curve could be anywhere within the scatter plot, therefore making it possible to have a negative/ positive correlation.

They are ideal for detecting any relationship between two variables.

21
Q

True or false. Categorical variables can take on an unlimited number of values.

A

False. Limited number of values.