Mod 13, Correlation/Regression Flashcards
Regression:
logical extension of correlation; moves beyond describing the strength of the association by making more specific predictions based on that association (for any given value of x, we can predict the value of y)
Significant correlation tells us that there is a significant positive or negative relationship between two variables: REGRESSION allows us to predict one variable from another variable
After verifying that a correlation is significant, you can determine the equation of the line that best fits the data
Regression Line: “line of best fit” can be used to predict the value of y for a given value of x
Correlation Coefficient:
quantifies the strength and direction of an association between two variables
Line of Best Fit
The more closely the dots are to the line of best fit: STRONGER THE RELATIONSHIP
Straight line drawn through the center of the data points that best expresses the association between the two variables
Conditions for Pearson Correlation
Random sampling
Continuous interval or ratio data
Normally distributed variables
No outliers
Linear association between variables (cannot detect non-linear associations)n
Pearson coefficients and standardized comparisons
Pearson coefficients posses a common metric (standard deviation units), they allow us to compare the strengths of relationships with one another (STANDARDIZES IT LIKE A Z SCORE)
Coefficient of Determination
how much of the variance in y we are actually able to explain, MORE VARIATION WE CAN ACTUALLY EXPLAIN: STRONGER ASSOCIATION BETWEEN VARIABLES (less third variable problems) dots are closer to the line
Rank Correlation conditions/ monotonicity
Used when variables are not an interval or ratio measurement, or if their association is not linear
Used when Pearson’s cannot be used
Spearman’s Correlation: instead of using raw data, uses ranked values
CONDITIONS: random sampling, both variables must be at least ordinal, variables must increase monotonically with one another
Monotonicity: refers to whether or not one set of scores tends to increase or decrease alongside another set
Linear associations: ARE MONOTONIC
Cohen’s d effect sizes
0.2= small
0.5=medium
0.8=large
Residuals
the difference between predicted values of y (dependent variable) and observed values of y .
Standard Error of Estimates
The standard deviation of the observed yi – values about the predicted y value for a given x value: COMMON MEASURE OF THE ACCURACY OF PREDICTIONS, the closer the observed y values are to the predicted y values, the smaller the standard error of estimate will be
WANT IT TO BE AS SMALL AS POSSIBLE = LESS ERROR IS ALWAYS BETTER
Difference between simple and multiple regression
Simple Regression: predicting values on one variable using info from one predictor variable
Multiple Regression: PREDICTING variables on an outcome variable from values on MORE than one predictor variable
Beta weights:
weighting of each of the factors; how much the outcome changes with one unit change in that predictor
Assumptions of Multiple Regressions
Linearity between each of the predictors and the outcome
Residuals are normally distributed about 0: points arent narrowing together from one another or widening apart from one another down the line
No extreme multicollinearity: can’t have predictors that are too highly correlated, we want our predictors to be explaining DIFFERENT aspects of our outcome, SHOULDNT BE ABOVE .8 OR BELOW -.8