Week 2 SCM (regressions) Flashcards
What general formula is basic statistical modelling based on?
outcome = model + error
What are 2 features that we should aim for with statistical model
we should aim for a statistical model that minimises error and can be generalised beyond the dataset
what statistical test should we use when there are two levels of our independent variable?
- a statistical t-test
what statistical test should we use when there are more than two continuous levels of our independent variable?
- a linear regression
what is a general description of the general linear model?
the general linear model is a model for which the DV is composed of a linear combination of independent variables
each independent variables has a weight given by b
this weight determines the relative contribution each variable makes to the overall prediction
what can we cant cant we use the correlation coefficient for?
we can use the coeffcient to describe the relationship between two variables. We can then test this relationship for significance
we cannot use the coefficient to make predictions
what can/cant we use the GLM for?
we can use the GLM for description of relationship, decision of significance(p) and also for prediction
what is the differences between correlation and linear regression?
- correlation quantifies the direction between and shape of two numeric variables (x&y). correlation always lies between -1 & 1.
- simple linear regression relates the two variables x & y to each other through an equation y = a + bx
- therefore if visualised on a graph, a linear regression would include a straight line. This line can then be used to make predictions
what is the equation of a simple linear regression and what represents the slope and the intersect of the line?
- if x and y are the variables a linear regression equation is: y= a + bx
- b is the slope of the line and a is the intersect
what is the difference between the line prediction and the specific data value called?
the residuals
what would the best line from a linear regression analysis show?
minimised residuals
what letter/symbol is used to define the slope and intercept of the line in a linear regression analysis
the slope is represented by b1
the intercept is represented by b0
what is the (more complicated) equation of a linear regression analysis?
Yi = (b0 + b1Xi) + E1
Yi is the outcome that we want to predict
Xi is the ith participants score on the predictor variable
b1 is the gradient of the regression line
b0 is the intercept of the regression line
E1 is the residuals, which represents the difference between the score predicted by the line, and the score that the participant actually obtained
how can you determine the type of relationship from the gradient of a line?
if the gradient is a positive value there is a positive relationship
if the gradient is a negative value there is a negative relationship
how do we asses the fit of a line?
- if the residuals is less then the line is a better fit
- therefore to assess the fit of a line we look at the values of the residuals (the vertical deviations)
- because the residuals can either be positive or negative, we must square them in this analysis
- therefore the line with the smallest sum of squared residuals is the best fitting line
- when conduction a linear regression, the mathematics will give us the line with the smallest sum of squared residuals
what is a simple definition of regression towards the mean
if a variable is extreme the first time you measure it, it will be closer to the average the next time you measure it
this is because if the variable is extreme, it is more likely to be influenced by chance
and therefore the next time you measure it the values are less likely to be extreme by chance and so it will be closer to the mean
how can regression to the mean trick us and how can we counteract this?
it can make it seem like an intervention is working but actually it is just the effect of regression to the mean
we can avoid being tricked by this by adding a control group
what is the difference between pearsons correlation coefficient and the regression coefficient?
- pearsons correlation coefficient is the covariance/the SD of x * y
- The regression coefficient is the covariance/the SD of x * x
how is the slope of a regression related to the correlation coefficient?
the slope (b1) = R * SDx/SDy
so the slope is equal to R * the ratio of the standard deviations of x and y
- this is because the covariance is R * SDx * SDy
- and the slope/regression coefficient is the covariance/ SDx*SDx
so if you enter the covariance formula in to the formula for the slope, it simplifies to the above equation
- this means that if the SD of x is the same as the SD of y, the correlation coefficient is equal to the regression coefficient
in what situation would the correlation coefficient (R) be equal to the regression coefficient
if the SD of x is equal to the SD of y
what would the variability of a regression model show and how would we calculate it?
- it would show how much variability in the outcome is not explained by the model
- we would calculate it by looking at the sum of squared errors
- each error is also known as the residual, and is the difference between the measured value and the value in the line of the regression slope
- we then square these to get the sum of squared errors
how do you calculate the mean squared error from the sum of squared errors?
- the SSE/df
- the degrees of freedom = N- 2 for a simple regression
how do you calculate the standard error of the model from the mean squared error of a linear regression?
the standard error of the model is the square root of the mean squared error
what does b0 represent in regression?
the intercept of the regression line