1. Statistical Learning Flashcards
What is the difference between supervised and unsupervised learning
Supervised: has response variable
Unsupervised: analyzes the observations or the variables without a response variable. Main idea is to identify patterns that may exist in the data.
What is the difference between a parametric method and a non-parametric method?
Parametric: specifies a functional form for f that includes free parameters (parameters that we estimate).
Non-parametric: makes no assumption about f’s functional form, f is then mainly algorithmic
What are the two main objectives to supervised learning?
Inference and prediction
A methods predictive strength coincides with its _______
Flexibility
What does one’s ability to make inferences depend on?
The interpretability of the model
Why are flexibility and interpretability inversely related?
Because if a model is very flexibly (fits the data too well), then it is likely that the model is complicated (not easily interpreted)
Methods that are less flexible, but more interpretable?
Lasso and subset selection
Methods that are moderately flexible and interpretable?
Least squares
Regression trees
Classification trees
Methods that are very flexible, but not interpretable?
Bagging
Boosting
Do flexibility and predictive accuracy go hand in hand? Why or why not
They do not. When a method is highly flexible, that means that it is flexible on the training data, not the test data.
Highly flexible = perfect predictions on past data
What does the bias of a model speak to?
The bias relates to the average closeness between f-hat and f.
What is the difference between prediction and inference?
Prediction: output of f-hat
Inference: comprehension of f
In KNN regression, which of the following are true as k increases?
A. Flexibility increases
B. Squared bias increases
C. Variance decreases
As k increases, the model becomes less flexible (worse at predicting)
A. False
B. True
C. True
Rank these three in terms of flexibility, in decreasing order.
Linear regression
Ridge regression
Regression tree
Most flexible: regression tree
Linear regression
Least flexible: Ridge regression
Rank in decreasing order of flexibility.
Linear regression
Lasso regression
Boosting
Boosting
Regression
Lasso
Which of these modelling techniques perform variable selection? Lasso Partial least squares PCA ridge
Only lasso
PCA and partial least squares: both use all variables in determining the partial least squares directions and the PCs
Is unsupervised learning used to draw inferences from datasets withoIt a specified response variable?
Yes
Does the accuracy of a prediction for Y depend on the irreducible and reducible error?
Yes
Correlation and covariance formula. What are the boundaries for each of these values?
Formula
Which statements are true regarding scatter plots?
A. If it shows a quadratic relationship, the variables’ sample correlation will be around 0.
B. They are not ideal for detecting non-linear relationships between two variables
Both false.
The quadratic curve could be anywhere within the scatter plot, therefore making it possible to have a negative/ positive correlation.
They are ideal for detecting any relationship between two variables.
True or false. Categorical variables can take on an unlimited number of values.
False. Limited number of values.