Regression Flashcards
Regression is…
used to understand relationship between variables
Independent variable (X)
Predictor or regressor
Dependent variable (y)
Outcome or reponse
Goal of regression
Predict changes in Y based on X
Correlation vs regression
Correlation - measures the strength and direction of a linear relationship
Regression Predicts Y based on X
Shared variance (r^2); Proportion of Ys variance explained by X
Simple linear Regression Equation
Y = B0 + B1 X
B0
Intercept: value of y when x is 0
B1
Slope: Change in Y per unit X
e
Error: Difference between observed and predicted Y
Regression using a SAMPLE of the populations
Sample estimates intercept and slope and predicted values of Y
Predicted values of Y are…
points on the Regression line that corresponds to the given value of X
Residuals (e’hat’) are…
distances between observed and predicted values of Y for corresponding X
Equations for the Slope (B1)
Equation for the Intercept (B0)
Correlation equation
What do you do with the r to find the proportion of shared variance?
rxy^2
Square it
1-r^2xy is the…
Variance of Y independent of X
Suppose we observe a high correlation between a child’s weight and their reading ability. This correlation is likely due to age, how can we combat the confounds?
We can control for the hypothesized influence of age on reading ability by removing the shared variance between age and weight
Squared Multiple Correlation (R-squared) formula:
The SMC represents the proportion of variance in Y shared with (or “explained by”) the set of all X variables
Numerator: proportion of non-redundant variance in Y shared with X1 and X2
Shared variance in Prediction
In two predictor regression, we are interested in imposing statistical control over X2 to test the unique effects of X1
Goal of Multiple Regression
- Evaluate the unique effect of X predictors on Y outcomes (holding constant other X)
- Determine the incremental contribution of new X predictors to estimating variance in Y (in addition to X already in the model)
- Determine the amount of variance explained in Y from a set of X predictors
To determine Incremental contribution to the model we use…
squared semi partial correlation
To determine variance explained in Y we use…
Squared multiple correlation
Regression is a method of finding an equation to describe…
The line of best for a set of data
How to define “best fitting” line when there are so many possibilities?
A line that is best fit for the actual data minimizes prediction errors
Error of prediction is…
the distance each point is from the regression line (Y- Ŷ)
Least-squared-error solution
Procedure that produces a line that minimizes the squared error of prediction
Linear model with several predictors
The linear model can be expanded to include as many predictors as you like
Expanded formula:
𝑌𝑖= (𝑏0+ 𝑏1 𝑋1𝑖+ 𝑏2 𝑋2𝑖 )+𝑒𝑖
r can be thought of a standardized version of…
b (slope)
Model Estimation
Total
Residual Sums of Squares (SSr)
Gauge of how well a particular line fits the data
Sums of Squares Regression (SSR)
Tells us how much error there is in the model but not wheter it is a better fit than nothing
Need to compare our model against a baseline model
Mean is a model of no relationship
Sums of Squares Total (SST)
The differences between observed values and the values predicted by the mean
Sums of Squares Model (SSM)
The difference between SST and SSR
SSY and SST notation is
Sums of Squares total
dfy = n-1
Consists of adding
Sums of squares Regression
df regression = 1
and
Sums of Squares residual
df residual = n-1
if SSM is large the regression model…
is very different from the mean to predict the outcome variable
This implies that the regression model has made a big improvement to how well the outcome variable can be predicted. If its small then using the regression model is better than using the mean
Variance explained by the regression model (R^2) formula
Mean Squares Regression Formula
Mean Squares Residual Formula
F Statistic Formula
Assessing individual predictors
Bivariate observations variable measurement scale
Interval
Different notation for sample and population regression statistics
A test of (rho=0)
If Rho=0 then the sampling distribution of r is almost normal with an expected value of rho and an estimated standard error of (Sr), given below (where n is the number of bivariate ‘pairs’ of observations…
To test the hypothesis of p Formula
Df = n-2
Example of
Confidence Interval on r (Formula)
Confidence Interval on r (example)
Residual sums of squares (SSR)
Gauge how well the particular line fits the data
Sums of Squares Total (SST)
The differences between the observed values and the values predicted by the mean
Sums of Squares Model (SSM)
The difference between SST and SSR