Chapter 1,2 - Overview Flashcards

Question 1

Q

Statistical Learning refers to…

Answer

A

A vast set of tools for understanding data. These tools can be unsupervised or supervised.

Question 2

Q

Supervised vs Unsupervised Learning:

Answer

A

Supervised: Predicting/estimating an output based on one or more inputs. We have labels for the data already.

Unsupervised: inputs with no supervising output,; but we can still learn relationships and structure from data.

Question 3

Q

When we want to predict our estimate a continuous or quantitative variable, what kind of problem is this?

Answer

A

Linear Regression

Question 4

Q

What is a Classification Problem?

Answer

A

When we are trying to predict a non-numerical value. Eg. Will stock prices go UP or DOWN (ignoring by how much)

Question 5

Q

What is a clustering problem?

Answer

A

Where we want to group variables based on observed characteristics.

No output variables for corresponding input variables, however we want to see data groups, structures and relationships.

Unsupervised.

Question 6

Q

What model/method could we use to predict a qualitative variable?
Eg patient lives or dies, stock market goes up or down?

Answer

A

Logistic Regression or Linear Discriminant Analysis

Question 7

Q

In statistics, what does this notation stand for:
n
p
x(small i,j)
X
T
y(small i)

Answer

A

n = number of total distinct data points
p = number of variables
x(small i,j) = ith observations of jth variable
X = n x p matrix whose (i,j)th element is xij
T = Transpose
y(small i) = ith observation of variable we wish to make predictions on

Question 8

Q

What are some of the names given to input vs output variables in ISLR.

Answer

A

Input: predictors, independent variables, fewtures, also just variables

Output: response or dependent variable - often denoted by symbol Y

Question 9

Q

Describe Y = f(X) + €

Answer

A

Assumptions:

f is some fixed but unknown function

€ is a random error term, which is independent of X and has a mean of zero

In this formulation, f represents the systematic information that X provides about Y.

Question 10

Q

In essence, statistical learning refers to a set of approaches for estimating:

Answer

A

f

Some unknown function f that we can use to predict/estimate Y

Question 11

Q

What are the two main reasons for estimating f(or the function)?

Answer

A

Two main reasons:

Prediction and Inference

Question 12

Q

The accuracy for Y(^hat) as a prediction for Y depends on two quantities….

Answer

A

Reducible error: in general, f(^hat) is not a perfect estimate of f, and this inaccuracy will introduce some error. This error is reducible because we can potentially improve the accuracy of f(hat) by using the most appropriate statistical learning technique for estimating f.

Irreducible error:
Even if we could perfectly predict f(X), we would still see error in our predictions. No matter how well we estimate f(X), we cannot reduce the error caused by €.
This is because Y is also a function of €, which by definition cannot be predicted by X.

Question 13

Q

Inference vs Prediction to estimate f

Answer

A

Prediction: estimate value for Y based on inputs X

Inference: what effect does change in input X have on Y.
We want to understand the relationship between X and Y

Question 14

Q

Inference vs Prediction example

Answer

A

Inference:
How much extra will a house be worth if it has a view of the river?

Prediction:
Is the house under-valued or overvalued?

Question 15

Q

If we want to find a function f(hat) that estimates Y ~= f(X) for any observation (X,Y), what type of statistical learning approaches can we take?

Answer

A

Parametric:
Involve a two-step model-based approach.
1. Make an assumption about the functional form. Eg. Assume that f is linear in X(linear model).

After selecting a model, we need a procedure that uses training data to fit or train model.

Non-parametric: no assumption of functional form. Seeks to estimate f by getting as close to data points ad possible.

Question 16

Q

Estimating f.

Parametric approach: Advantages vs disadvantages

Answer

Study These Flashcards

A

Advantages: assuming form for f simplifies the problem and makes it easier (generally speaking) to estimate a set of parameter.

Disadvantages: the model we choose will generall not much the true unkown form of f. If form is too far from f, estimate will be poor.

Question 17

Q

Estimating f.

Non parametric methods advantages vs disadvantages:

Answer

Study These Flashcards

A

Advantages: potential to fit a wider range of possible shapes for f. Avoid danger of functional form used to estimated f being different from real f as no assumptions are made.

Disadvantages:
Major one is since they don’t reduce problem of estimating f to a small number of paramaters, a very large number of observations is required in order to obtain an accurate estimate for f.

Question 18

Q

What is the trade-off between prediction accuracy and model interpretability?

Answer

Study These Flashcards

A

As Interpretability increases:
Flexibility decreases….

And vice versa

Chapter 1,2 - Overview Flashcards

(18 cards)