Regression and Correlation Flashcards
What is correlation?
Correlation quantifies the strength of the association between two quantitative variables
What is Pearson’s correlation coefficient a measure of?
The scatter underlying a linear trend between two quantitative variables
What is linear regression?
It studies the linear relationship between two quantitative variables when one (dependent variable) is modelled as depending on the other (independent variable)
A linear regression model allows predictions about the dependent variable to be made among individuals. T/F?
True
Correlation models allows predictions about the dependent variable to be made among individuals. T/F?
False
Correlation is quantified on a scale from -1 to 1. T/F?
True
What are the assumptions in calculating correlation?
Independent observations
Bivariate Normal distribution
Relationship between X and Y is linear
The value and units of measurements of variables is unimportant for measuring correlation. T/F?
True
The value and units of measurements of variables is unimportant in regression models. T/F?
False - this is significant
Which variable is classified as X and which as Y is significant in correlation. T/F?
False
Which variable is classified as X and which as Y is significant in correlation. T/F?
True - Y should be the dependent variable. and X should be the independent variable
Why is the calculation for a CI or hypothesis test for correlation not the same as for tests of associations?
Because the sampling variability for correlation does not follow a Normal distribution
How can a straight line be described mathematically?
Y = alpha + beta X (alpha = y-intercept) (beta = gradient of line)
In a regression model, what is a residual?
The vertical distance of a data point from the line of best fit
How is the best fit line plotted in linear regression models?
The best fit line is taken as the. one which makes the sum of the squares of. the residuals as small as possible. I.e. it minimises the variance of the residuals
What is the other name for the line of best fit in linear regression models?
The Least Squares Linear Regression Line
In the Minitab/SPSS output for a linear regression model, what does the row labelled ‘constant’ represent?
This gives a test of the null hypothesis that the intercept of the population regression line is 0
In the Minitab/SPSS output for a linear regression model, what does the row labelled ‘weight’ represent?
This gives a test of the null hypothesis that the slope of the regression line is 0
In the Minitab/SPSS output for a linear regression model, what does the column labelled ‘SE coefficient’ represent?
The standard errors of the coefficients, allowing calculation of the CIs of the coefficients
In the Minitab/SPSS output for a linear regression model, what does the quantity ‘s represent?
The standard deviation of the points around the regression line
In the Minitab/SPSS output for a linear regression model, what does the row labelled ‘R-sq’ represent?
The coefficient of determination which tells of the percentage of the variability in Y which is explained by variation in X
What are the assumptions for linear regression?
Constant variance - the spread of the response Y, about its average value is the same for all values of X
Linearity - the average of the response, Y, is a linear function of the explanatory X
Independent observations
Normality of residuals
Error free values for x - for each pair of observations, the predictor x needs to be known with no error and the response y is a random observation
X and Y must follow a normal distribution in linear regression. T/F?
False
A prediction interval (in linear regression) will be wider than the confidence interval. T/F?
True
What is a prediction interval?
An estimate of the interval in which a future observation will fall