01 Intro Flashcards
Basic Probability and New Terminology
Regression Analysis
investigates quantitative, predictive relationships
Response Variable (Y)
dependent variable which serves as a response, quantity we are trying to predict
(X1,Y1),(X2,Y2),…,(Xn,Yn)
Predictor Variable (X)
independent variable which serves as a predictor, often known as a covariate or feature
(X1,Y1),(X2,Y2),…,(Xn,Yn)
Conditional Expectation in Regression
E(Y|X = x),
Regression Function
x|->E[Y|X = x]
Ex) how the mean response differs across possible values of Covariate X
Expanded Regression Function and Conditional Expectation Form
x = (x1,x2,…xp) |-> E[Y|X = x] = B0 + B1x1 + B2x2 + … + Bpxp
Regression Parameters
the p + 1 coefficient (one for each covariate and the additional one, B0, for the intercept)
x |-> E[Y|X = x]
= a + Bx
Linear Model to Estimate Regression Parameters
x = (x1,…,xp) |-> u(hat) (x) = B(hat)0 + B(hat)1x1 + B(hat)2x2 +…+ B(hat)pxp
Prediction
quantify how well linear regression model “fit” the data in the sense of being able to predict an unforseen data point (Xnew, Ynew)
Predictive Mean Squared Error
E[(Ynew -u(hat)Xnew)^2]
Inference
assess how accurate the estimators of regression coefficients are
Observational vs Experimental Data
randomized experiment - manipulate the value of the predictor value to isolate its effect on the response variable
observational study - researcher still care about the effect the predictor has on the outcome but they are more likely to just observe the nature of the relationship between predictor and response as they occur
Confounder variables
Ex) people who take multivitamins have fewer heart attacks. but - if people who take multivitamins smoke less and exercise more than people who don’t take multivitamins.
association between multivitamins and heart attack is confounded by exercise and smoking
Regression Analysis in Observational Studies
regression analysis allows for statistically sounds conclusions about the association between the response and covariate - NOT causation
The Capital-Asset Pricing Model (CAPM)
univariate regression model that relates to the excess return of an asset to the excess return of the market
Formula 1:
Rt = (P(t) - P(t-1))/(P(t-1))
Formula 2:
R(t) - v(t) = a +B (M(t) - v(t)) + E(t)