Week 2 SCM (regressions) Flashcards
What general formula is basic statistical modelling based on?
outcome = model + error
What are 2 features that we should aim for with statistical model
we should aim for a statistical model that minimises error and can be generalised beyond the dataset
what statistical test should we use when there are two levels of our independent variable?
- a statistical t-test
what statistical test should we use when there are more than two continuous levels of our independent variable?
- a linear regression
what is a general description of the general linear model?
the general linear model is a model for which the DV is composed of a linear combination of independent variables
each independent variables has a weight given by b
this weight determines the relative contribution each variable makes to the overall prediction
what can we cant cant we use the correlation coefficient for?
we can use the coeffcient to describe the relationship between two variables. We can then test this relationship for significance
we cannot use the coefficient to make predictions
what can/cant we use the GLM for?
we can use the GLM for description of relationship, decision of significance(p) and also for prediction
what is the differences between correlation and linear regression?
- correlation quantifies the direction between and shape of two numeric variables (x&y). correlation always lies between -1 & 1.
- simple linear regression relates the two variables x & y to each other through an equation y = a + bx
- therefore if visualised on a graph, a linear regression would include a straight line. This line can then be used to make predictions
what is the equation of a simple linear regression and what represents the slope and the intersect of the line?
- if x and y are the variables a linear regression equation is: y= a + bx
- b is the slope of the line and a is the intersect
what is the difference between the line prediction and the specific data value called?
the residuals
what would the best line from a linear regression analysis show?
minimised residuals
what letter/symbol is used to define the slope and intercept of the line in a linear regression analysis
the slope is represented by b1
the intercept is represented by b0
what is the (more complicated) equation of a linear regression analysis?
Yi = (b0 + b1Xi) + E1
Yi is the outcome that we want to predict
Xi is the ith participants score on the predictor variable
b1 is the gradient of the regression line
b0 is the intercept of the regression line
E1 is the residuals, which represents the difference between the score predicted by the line, and the score that the participant actually obtained
how can you determine the type of relationship from the gradient of a line?
if the gradient is a positive value there is a positive relationship
if the gradient is a negative value there is a negative relationship
how do we asses the fit of a line?
- if the residuals is less then the line is a better fit
- therefore to assess the fit of a line we look at the values of the residuals (the vertical deviations)
- because the residuals can either be positive or negative, we must square them in this analysis
- therefore the line with the smallest sum of squared residuals is the best fitting line
- when conduction a linear regression, the mathematics will give us the line with the smallest sum of squared residuals