Study Session 3 - Correlation and Regression Flashcards
Formula for sample covariance
Sum of (Xi-Xbar)(Yi-Ybar)/N-1
Formula for correlation
Cov,xy/sd,x(sd,y)
What are the three limitations of correlation analysis?
Impact of outliers, spurious correlation, and nonlinear relationships
What is an outlier?
Relative to the rest of the data the value of a sample may be extraordinarily large or small.. Can skew data to show there is a relationship when there isn’t, and vise versa.
What is spurious correlation?
The appearance of a casual linear relationship when there is none. Correlation by chance.
What is a nonlinear relationship in correlation analysis?
Correlation captures linear relationships but not nonlinear relationships such as parabolas or other shapes.
What is the hypothesis test for testing correlation?
Testing if H0: p = 0 v. Ha: p=/=0
to test whether the correlation between the population of two variables is equal to zero.
Formula for critical value t is correlation hypothesis test
t=r√n-2/√1-r²
n-2 degrees of freedom, r is the correlation of the sample.
Reject outside t value, fail to reject within interval.
What is the purpose of a simple linear regression?
To explain the variation in a dependent variable in terms of variation in a single independent variable. Dependent is the explained variable - the predicted. Independent is the explainer - the predictor.
What are the underlying assumptions of linear regression?
- a relationship exists between the independent and dependent variable.
- the independent variable is uncorrelated with the residuals.
- The expected value of the residual term is zero E(𝝴)=0
- Each residual term is independently distributed, not related to that of another.
- The residual term is normally distributed.
What is the regression model formula?
Yᵢ=b₀+b₁Xᵢ+𝝴ᵢ, i=1…, n
Y=value of the dependent variable
b0= regression intercept term
b1= regression slope coefficient
𝝴ᵢ= residual for ith observation
Give lines through scatter plot that ‘best’ explains the values for Y in terms of X. “best fit”
What is the linear equation?
Ŷᵢ=B₀ + b₁Xᵢ i=1
Same as regression but y, b0, b1 are ESTIMATED values.
What is Sum of Squared Errors (SSE)?
The sum of the squared distances between the estimated and actual Y-values.
Formula for Estimated slope coefficient
For regression line describes the change in Y for one unit of X
b₁=covᵪᵧ/ο²ᵪ
slope of the simple regression is estimated by covariance divided by variance of the sample.
What is the intercept term? And formula
Y where X=0
b₀=Ῡ-b₁X
where b0, b1 are ESTIMATES and X, Y are MEANS.
What is Standard error of estimate and what does it measure?
SEE measures the degree of variability of the actual Y-values relative to the estimated. Smaller the error, better the fit. It is the standard deviation of the error terms.
What is the coefficient of determination?
R². Higher R2, more the independent variable explains the results of the dependent variable.
What is the common hypothesis test for regression coefficient?
To see if the slope coefficient is different than 0.
H₀: b₁=0 Ha: b₁=/=0
b1+/-t x standard error of regression coefficient.
Explain hypothesis test for true slope coefficient?
a t-test can be set up to test if the true slope coefficient is statistically different from a hypothesized value.
with n-2 degrees of freedom,
tb₁= b1(estimate)-b1/standard deviation of b1 estimates.
What is the formula for PREDICTED VALUES?
Ŷ=B̂₀+B̂₁Xp
Y is the predicted value of dependent
Xp is the FORECASTED value of independent.
What is ANOVA?
Analyzes total variability of the dependent variable.
What is Total Sum of Squares (SST)?
SST is the total variation in the dependent variable.
SST=RSS+SSE
SST=Explained+unexplained
∑(Yᵢ-ȳᵢ)²
What is Regression Sum of Squares?
measures the variation of the dependent variable that is explained by the independent variable. Sum of the squared distances between predicted values and mean of Y.
What is mean regression sum of squares (MSR)?
RSS/k
Regression sum of squares divided by degrees of freedom - k - number of independent variables.
What is mean squared error (MSE)?
SSE/n-2
How is F calculated? What is it?
MSR/MSE = RSS/k/SSE/n-k-1
explains how well, as a group the independent variables explain the variation of the dependent variables.
always a one tailed test