Lecture 20 (19) Flashcards
What is regression?
Using correlations between variables to make predictions
What is the formula for regression?
Y = a + b1X
What are you making a prediction about with regression?
This is what we’re making the prediction about. Aka making predictions about who isgonna win an elexction (percentage of votes the candidate will get.
what does the y represent in the regression formula?
dependent variable, aka criterion variable
What does the x represent in the regression formula?
X = independent variable, aka predictor variable
what does a represent in the regression formula?
a = intercept, aka alpha This is indicating what the y is equal to when x equals zero.
when x = 0 what is Y
what does b1 represent in the regression formula?
b1 = slope coefficient, aka beta weight
This implies when x increases by a certain amount, how much will y in turn increase. The larger the beta is, the steaper the relationship between x and y.
if x increases by 1 unit, Y increases by beta times X
what does it usually mean when you have a lot of variability around the line of best fit?
it isn’t closely related
What was the midtemr and smartbook quizzes example for regression?
He wanted to see if the number of quizzes completed correlates with the performance on exams. The completion of all of the quiz activities and the performance on the exam. predictor variable, how many quizzes did the student complete, criterion varaibel, percentage scored on the exam. The intercept means if you complete none of those smartbook activites, the intercept is the best predictor you have of how you will do on the exam. Then for each activity you complete, you get on average a 1.2 % increase on the score.
Midterm = Intercept + Slope*(# completed SmartBook)
Y = 60 + 1.2x
What is the effect size for regression?
What about partial r squared
Effect size = R2 or total variance [in Y] explained Partial R2 = variance explained by single predictor
…given chosen regression model
What was the multiple regression example provided? How could you identify which predictor is the strongest?
Y = Midterm score
x1 = SmartBook
x2 = Article Review
x3 = Lab completion
this intercept is the predicted score on the midterm exam score if they got a 0 on all of these predictors.
Y=6.0+.4x1 +.7x2 +.6*x3
They predict that for every smart book activity you complete you should expect a .4 increaso on the exam and so non and so forth.
Translation = getting more points on xj correlates with higher performance on midterm!
Y = Midterm score
x1 = SmartBook
x2 = Article Review
x3 = Lab completion
Y=6.0+.4x1 +.7x2 +.6*x3
This formula uses raw units
To compare different predictors, we need to standardize
Y=6.0+.04x1 +.04x2 +.22*x3
This formula uses standardized units
when each of these are standardized we can compare then against each other
peirson’s r is a standardized unit. Coen’s D is a standardized unit. When we standardize something we are divdiing that value by the estimate of the standard deviation.
you want to standardize to account for the variability of scores
What is multiple regression?
to extend the example we could include mre predictors. There is no limit on how many predictors you can have. This then becomes mu;tipe regression. Each of the predictor variables get their own beta. Each will have a different relationship with the criterion variable. As we get more predictor variables, our prediction will improve. however, a lot of those correlations can be noise and that’s a problem. Random variability in predictor variables can correlate with random variability on the criterion variable. We want to find a tool to correct for this. The one that he really wants us to know is R squared. This is the square of the peirsons r (peirsons r multiplied by itself). the reason we use this is because ti’s standardized and the standardized variation allows us to compare this r squared with other r squareds. By standardized we mean it includes standard deviation. The higher the r sqaured the better your predictions about what the students midtemr score will be. The better iti will be clustered around the line of best fit.