Correlation And Regression Flashcards
Regression line
Y = b1 + b0x
- b1 = slope
- b0 = intercept
- predictions are based on the average.
- changes in y as the value of x goes up.
- if curved = cannot make predictions
Correlations
- between +1 and -1
- strength and directionality
- correlation = covariance (x,y) / SD(x) x SD(y)
Coefficient of determination
- R^2
- strength only
- explained variance - shows how well the data fits the regression model
- value ranges from 0 to 1 with 0.9 showing a good fit.
Regression line and correlations
- R and the slope will always have the same line.
Least squares regression line
- minimizes the sum of squares of the residuals
- residual = observed value - predicted value
- always passes through the mean of x and y
- if r = 0, then LSR = 0.
- the mean is always 0??
Influential points
- outliers that can have an effect on the data.
- lurking variables: neither explanatory nor response but influences interpretation.
Cause and effect relationship
- experimental designs only.
SPSS output
- Constant = slope
- BTU (name of variable) = intercept.
- interpretation: for every x input, the output increases by the amount of the intercept.
Distributions
- joint distribution: dividing the count in each cell by the total number of all observations
- marginal distribution: row total / column total
- conditional distribution: cell / column total
T-test for the slope
- degrees of freedom: n - p - 1 (p = # of predictors)
Multiple regression line
Y = b0 + b1x1 + ei
- b0 = mean(y) - b1 x mean(x)
- b1 = r(sy/sx)
- ei sums to 0 (residuals / vertical deviations from the least squared line)
Confidence interval
- narrow in the middle and wider at the end
- check if its 5% (90%) in each tail or 5% total (95%)
- if 0 is in the confidence interval => cannot reject h0
- if 0 is not in the confidence intervals => intercept of the line i not 0.
- for 99% = p-value needs to be less than 0.01 to reject the null
Variables
Explanatory variables influence the outcome (response variables)
Regression coefficients
- estimates of the unknown population parameters and describes the relationship between predictor and response.
- coefficients are the values that multiply the predictor values.
F test
- MSM is the same as MSR in formula sheet
- 2 degrees of freedom: p - 1 for numerator (model) and n - p for denominator (error). p is number of predictors and n is number of observations.
- critical value for F = find df and find critical value using the value and alpha level.
Residuals
Observerd value - predicted value
- positive = values are being underpredicted.
- negative = values are being overpredicted
- values are positive when over the regression line and negative when under.
Extrapolation
Making predictions outside of a given range
Correlation vs causation
- causation: x influences y, y influences x
- common response: x influences y, y influences x but z influences x and y.
- confounding: x influences y, y influences x but z influences y and x and z influence each other too.