Research methods and statistics 3 (year two) Flashcards
Explain and define P-Hacking
Method of manipulating data to get significant results
Multiple analyses
Omitting info
Controlling for variables
Analyse partway through then collect more data
Changing DV
Explain how outliers can be an issue
Outliers in small sample sizes can be the difference between a significant and non-significant result
Non-parametric correlations can combat this (e.g original test is pearson correlation, NP is spearman’s rho)
Define regression and what it tells us
A test of if two or more variables can predict variance in an outcome variable
E.g clinical psychologist may want to know what variables are associated with psychosis symptoms
Tells us:
If model is a good fit
If there are significant relationships between a predictor variable and an outcome variable
The direction of the relationships
Can then make predictions beyond our data
Predicts a line of best fit for association between variables
Give the linear regression equation
Yi= (B0+B1Xi) + ei
Yi = outcome/variable you’re predicting
B0 = intercept, constant – mean value of outcome variable if the predictor in model is 0. Positions line at intercept
B1Xi = predictor variable, tells you the shape of line of best fit (also called parameter estimate)
Ei = error term, amount of variance left over in model
Define beta slope
Slope aka beta: number of units change in the dependent variable for every 1 unit change in the IV
Give the assumptions for regression
Normally distributed continuous outcome
Independent data
Ratio/interval predictors
Nominal predictors with two categories (dichotomous)
No multicollinearity for multiple regression
Careful of influencing cases
Give the parameters needed to work out how well the regression model fits the data
To work out how well the model fits the data we need to know:
Sum of squares total (SST)
Used to generate test statistic – ideally as high as possible
Proportion of improvement due to model
Sum of squares residual (SSR)
Sum of squares model (SSM)
SST uses difference between observed data and mean value of outcome
SSR uses difference between observed data and regression line
SSM uses difference between the mean value of Y and the regression line
Give the equation and components for generating a regression test statistic
Test statistic tells us the ratio of explained vs unexplained variable in the outcome
F test (Model fit) = MSm
Msr
MSm = means of the squares of the model
MSr = means of the squares of the residual
F test tells us if it is a good fit of the data – are we explaining variance?
Define proportion of total variation and give the equation
Proportion of total variation (SST) that is explained by regression (SSR) is known as the coefficient of determination and referred to as R^2
R^2 = SSR
SST
R2 can vary between 0 and 1 and often expressed as %
R2 is not that useful if you have more than one predictor variable – more than one = r2 adjusted
Adjusted r2 = how effective the model is
Explain when multiple regression is needed
Two or more variables to predict our outcome
To improve explanatory potential – examine which predictors are statistically significant
Give the equation for multiple regression
Yi= (B0+B1X1i+B2X2i) + ei
Explain the spss output for simple regression
Variables entered/removed allows you to double check the info you put in
Model summary: gives R2 statistic – always report adjusted R square
ANOVA: tells us about our model fit (is model a better fit than just using the mean) – F-test
Coefficients: tells us about the individual predictors in our model – whether they are significant and their direction (unstandardized coefficients)
Give an example APA style writeup for simple regression
A simple regression was carried out to investigate the relationship ——- and ——. The regression model was significant and predicted approximately % of variance (adjusted R2 = .-;F(X,Y) = -, P=-). ——– was a significant/insignificant predictor of ——– (b=.-(s.e=.-); % - to -; t=- p=-)
Define multicollinearity
Multicollinearity: occurs when independent variables in a regression model are highly correlated
If two/more predictor variables in model are highly correlated with each other they do not provide unique/independent info to the model
Can adversely affect regression estimates
Large amounts of variance explained but no significant predictors
Explain how to identify multicollinearity
Identifying multi-collinearity
Look for high correlations between variables in a correlation matrix ( r>.8)
R = 1 is perfect MC – data issue
Tolerance statistic
Percentage of variance in IV not accounted for by other IVs
1 – R2
High tolerance = low multicollinearity
Low tolerance = high multicollinearity
Variance inflation factor
1/tolerance
Indicates how much the standard error will be inflated by