Regression Analysis, Time Series Analysis Flashcards
How is correlation coefficient notated/
If measured for a population, called ? (rho)
If estimated from a sample, sure r; i.e r estimates ?
What is true of the correlation coefficient?
- Correlation and covariance are appropriate for use with continuous variables whose distributions have the same shape (e.g. both normally distributed)
- If these assumptions are not met, r will be ‘deflated’ and underestimates ?.
What are the data in a regression analysis?
- One continuous response variable (called y - dependent variable, response variable)
- One or more continuous explanatory variable (called x - independent variable, explanatory variable, predictor variable, regressor variable)
What is true of εi?
Mean of εi will be zero
What does E(Y|X=x) = β0 + β1x mean?
The expected value of Y when X = x is β0 + β1x
What is the true/population regression line?
- Yi = β0 + β1xi + εi
- β0 and β1 are constant to be estimated
- εi is a random variable with mean = 0 if our line is going through the middle of our data
How is the population regression line estimated?
- ŷ = b0 + b1x
- b0 and b1 are estimated values
- ŷ is a fitted value of the response
What is a residual?
vertical distance between observed response and fitted value of response
How are residuals estimated?
ri estimates εi, the error variable
What is SSE?
The error sum of squares
SSE = nΣi=1 (yi - ŷ)2
What error assumptions do we make in regression analysis?
- In our fitting we assume the errors have a particular distribution - that is, ε ~ N(o, σε2)
- Normal distibution
- Mean = 0
- Constant variance = σε2
- Errors associated with any two y values are independent
What is sε?
- sε = standard error of the estimate
- Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line
What are the t stats in regression analysis output?
T = test statistic (that population intercept/slope = 0 against two sided alternative), compared to t with n-2 degrees of freedom finds P = 0, i.e. intercept/slope is not 0
What is S in regression analysis output?
Standard Error of the Regression (S) = average distance that values fall from regression line
What is R^2?
- Determine the strength and significance of association
- coefficient of determination
- measures proportion of total variation explained, i.e.
- = explained variation / total variation = SSreg / SSy =(correlation coefficient)^2
- Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation