Chapter 1,2 - Overview Flashcards
Statistical Learning refers to…
A vast set of tools for understanding data. These tools can be unsupervised or supervised.
Supervised vs Unsupervised Learning:
Supervised: Predicting/estimating an output based on one or more inputs. We have labels for the data already.
Unsupervised: inputs with no supervising output,; but we can still learn relationships and structure from data.
When we want to predict our estimate a continuous or quantitative variable, what kind of problem is this?
Linear Regression
What is a Classification Problem?
When we are trying to predict a non-numerical value. Eg. Will stock prices go UP or DOWN (ignoring by how much)
What is a clustering problem?
Where we want to group variables based on observed characteristics.
No output variables for corresponding input variables, however we want to see data groups, structures and relationships.
Unsupervised.
What model/method could we use to predict a qualitative variable?
Eg patient lives or dies, stock market goes up or down?
Logistic Regression or Linear Discriminant Analysis
In statistics, what does this notation stand for: n p x(small i,j) X T y(small i)
n = number of total distinct data points
p = number of variables
x(small i,j) = ith observations of jth variable
X = n x p matrix whose (i,j)th element is xij
T = Transpose
y(small i) = ith observation of variable we wish to make predictions on
What are some of the names given to input vs output variables in ISLR.
Input: predictors, independent variables, fewtures, also just variables
Output: response or dependent variable - often denoted by symbol Y
Describe Y = f(X) + €
Assumptions:
f is some fixed but unknown function
€ is a random error term, which is independent of X and has a mean of zero
In this formulation, f represents the systematic information that X provides about Y.
In essence, statistical learning refers to a set of approaches for estimating:
f
Some unknown function f that we can use to predict/estimate Y
What are the two main reasons for estimating f(or the function)?
Two main reasons:
Prediction and Inference
The accuracy for Y(^hat) as a prediction for Y depends on two quantities….
Reducible error: in general, f(^hat) is not a perfect estimate of f, and this inaccuracy will introduce some error. This error is reducible because we can potentially improve the accuracy of f(hat) by using the most appropriate statistical learning technique for estimating f.
Irreducible error:
Even if we could perfectly predict f(X), we would still see error in our predictions. No matter how well we estimate f(X), we cannot reduce the error caused by €.
This is because Y is also a function of €, which by definition cannot be predicted by X.
Inference vs Prediction to estimate f
Prediction: estimate value for Y based on inputs X
Inference: what effect does change in input X have on Y.
We want to understand the relationship between X and Y
Inference vs Prediction example
Inference:
How much extra will a house be worth if it has a view of the river?
Prediction:
Is the house under-valued or overvalued?
If we want to find a function f(hat) that estimates Y ~= f(X) for any observation (X,Y), what type of statistical learning approaches can we take?
Parametric:
Involve a two-step model-based approach.
1. Make an assumption about the functional form. Eg. Assume that f is linear in X(linear model).
- After selecting a model, we need a procedure that uses training data to fit or train model.
Non-parametric: no assumption of functional form. Seeks to estimate f by getting as close to data points ad possible.