Research methods and statistics 3 (year two) Flashcards by Bethan Lowry

Explain and define P-Hacking

Method of manipulating data to get significant results

Multiple analyses

Omitting info

Controlling for variables

Analyse partway through then collect more data

Changing DV

How well did you know this?

Not at all

Perfectly

Explain how outliers can be an issue

Outliers in small sample sizes can be the difference between a significant and non-significant result

Non-parametric correlations can combat this (e.g original test is pearson correlation, NP is spearman’s rho)

How well did you know this?

Not at all

Perfectly

Define regression and what it tells us

A test of if two or more variables can predict variance in an outcome variable

E.g clinical psychologist may want to know what variables are associated with psychosis symptoms

Tells us:

If model is a good fit

If there are significant relationships between a predictor variable and an outcome variable

The direction of the relationships

Can then make predictions beyond our data

Predicts a line of best fit for association between variables

How well did you know this?

Not at all

Perfectly

Give the linear regression equation

Yi= (B0+B1Xi) + ei

Yi = outcome/variable you’re predicting

B0 = intercept, constant – mean value of outcome variable if the predictor in model is 0. Positions line at intercept

B1Xi = predictor variable, tells you the shape of line of best fit (also called parameter estimate)

Ei = error term, amount of variance left over in model

How well did you know this?

Not at all

Perfectly

Define beta slope

Slope aka beta: number of units change in the dependent variable for every 1 unit change in the IV

How well did you know this?

Not at all

Perfectly

Give the assumptions for regression

Normally distributed continuous outcome

Independent data

Ratio/interval predictors

Nominal predictors with two categories (dichotomous)

No multicollinearity for multiple regression

Careful of influencing cases

How well did you know this?

Not at all

Perfectly

Give the parameters needed to work out how well the regression model fits the data

To work out how well the model fits the data we need to know:

Sum of squares total (SST)

Used to generate test statistic – ideally as high as possible

Proportion of improvement due to model

Sum of squares residual (SSR)

Sum of squares model (SSM)

SST uses difference between observed data and mean value of outcome

SSR uses difference between observed data and regression line

SSM uses difference between the mean value of Y and the regression line

How well did you know this?

Not at all

Perfectly

Give the equation and components for generating a regression test statistic

Test statistic tells us the ratio of explained vs unexplained variable in the outcome

F test (Model fit) = MSm

Msr

MSm = means of the squares of the model

MSr = means of the squares of the residual

F test tells us if it is a good fit of the data – are we explaining variance?

How well did you know this?

Not at all

Perfectly

Define proportion of total variation and give the equation

Proportion of total variation (SST) that is explained by regression (SSR) is known as the coefficient of determination and referred to as R^2

R^2 = SSR

SST

R2 can vary between 0 and 1 and often expressed as %

R2 is not that useful if you have more than one predictor variable – more than one = r2 adjusted

Adjusted r2 = how effective the model is

How well did you know this?

Not at all

Perfectly

Explain when multiple regression is needed

Two or more variables to predict our outcome

To improve explanatory potential – examine which predictors are statistically significant

How well did you know this?

Not at all

Perfectly

Give the equation for multiple regression

Yi= (B0+B1X1i+B2X2i) + ei

How well did you know this?

Not at all

Perfectly

Explain the spss output for simple regression

Variables entered/removed allows you to double check the info you put in

Model summary: gives R2 statistic – always report adjusted R square

ANOVA: tells us about our model fit (is model a better fit than just using the mean) – F-test

Coefficients: tells us about the individual predictors in our model – whether they are significant and their direction (unstandardized coefficients)

How well did you know this?

Not at all

Perfectly

Give an example APA style writeup for simple regression

A simple regression was carried out to investigate the relationship ——- and ——. The regression model was significant and predicted approximately % of variance (adjusted R2 = .-;F(X,Y) = -, P=-). ——– was a significant/insignificant predictor of ——– (b=.-(s.e=.-); % - to -; t=- p=-)

How well did you know this?

Not at all

Perfectly

Define multicollinearity

Multicollinearity: occurs when independent variables in a regression model are highly correlated

If two/more predictor variables in model are highly correlated with each other they do not provide unique/independent info to the model

Can adversely affect regression estimates

Large amounts of variance explained but no significant predictors

How well did you know this?

Not at all

Perfectly

Explain how to identify multicollinearity

Identifying multi-collinearity

Look for high correlations between variables in a correlation matrix ( r>.8)

R = 1 is perfect MC – data issue

Tolerance statistic

Percentage of variance in IV not accounted for by other IVs

1 – R2

High tolerance = low multicollinearity

Low tolerance = high multicollinearity

Variance inflation factor

1/tolerance

Indicates how much the standard error will be inflated by

How well did you know this?

Not at all

Perfectly

Give ways of fixing multi-collinearity issues

Study These Flashcards

Fixing multi-collinearity issues

Increase sample sixe

Remove redundant variable

If two or more variables are important, create a variable that takes both of them into account

Give an example AQA style writeup for multiple regressions

Study These Flashcards

A multiple regression was conducted to investigate the roles of –, – and – on —. The regression model was — and predicted -% of variance (adjusted R2=-:F(-,-) = -, P=-). Variance inflation factors suggest multicollinearity was not/ was a concern (–=-, – =, –=-). – was a significant/non-significant predictor of — (b=- (s.e = -); % Ci – to -; t=-, p=-) and – was not a significant/was significant predictor (b=- (s.e = -); % Ci – to -; t=-, p=-)

Explain what a mediator is

Study These Flashcards

Links two variables
Mediator is a variable that is affected by the IV, mediator influences DF
Effect of the IV on the DV (IV-DV) is partially dependent on the mediator (IV-M-DV)
IV-DV = direct effect ( c )
IV-M = a-path
M-DV = b-path
Full mediation: inclusion of mediator renders direct IV-DV effect non-significant
Partial mediation: inclusion of mediator renders direct IV-DV effect less significant

Explain the difference between mediation and moderation

Study These Flashcards

Mediation: a variable that accounts for an association between a predictor and a DV
- Moderator: affects the strength of a relationship between a predictor/DV
o Moderator does not have to be associated with the IV or DV
- Mediator MUST be something that can change, e.g age cannot be a mediator, craving can
o IV has to influence mediator, nothing can influence age

Give some issues of the causal steps approach

Study These Flashcards

Has little or no sensitivity at all (needs huge sample)
Mathematically incorrect (IV DV association not necessary)
Is unable to detect suppression effects

Explain why the Sobel test is a bad solution

Study These Flashcards

Gives a p value for the indirect effect

Based upon a product of the coefficients calculation
Assumes the product of the coefficient is normally distributed – this is almost never the case
This method also requires more participants to detect indirect effects than the methods used today

Explain why the joint significance test is a good solution to mediation

Study These Flashcards

This method ignores the IV-DV association (doesn’t have to be significant)
o If a path and b path are significant there is evidence of mediation
- Also give confidence intervals for the for indirect effect

Explain how to run an SPSS analysis for mediation

Study These Flashcards

example: IV = personality disorder, M= enhancement, DV = alcohol units consumed
- Firstly we produce two regressions
1. IV-M Personality disorder to enhancement
2. M-DV enhancement (+personality disorder) to alcohol units consumed
- We control for personality disorder in the second regression so we can be sure that the mediator is predicting variance beyond that accounted by the IV
- Analyse – regression – linear
- Regression 1: IV to M
- To do our additional test for mediation we need to take the unstandardized regression coefficient and its standard error and use it in Remediation (we can use three dp when using the program)
- Then run the second regression M-DV but controlling for PDQ-4 (personality disorder)
o Enhancement (and PDQ-4) to Units consumed
o If enhancement is significant in this regression then there is evidence of mediation

Explain how to writeup mediation in APA format

Study These Flashcards

The first regression IV-M
o The regression is significant (R² adjusted = 0.15, F(1,225)=41.32, p

8. Explain how to use confidence intervals

95% confidence intervals reflects how confident we can be in our regression coefficient, it expresses the precision of our estimate, 95% of samples from this population will fall in this range (if we give 95% CI’s we could give 99% CI’s etc) - High precision= “tighter” CI, this is a good thing it shows consistency in an effect. - If it overlaps with 0 this means there will not be a significant effect as the range of predicted values overlaps with no effect (0= no change) - If they don’t overlap you have a significant effect p

Explain the function of some spss values given in regression

Model fit: F test Amount of variance explained: adjusted R^2 Significance of individual predictors - betas

Explain the difference between simple multiple and hierarchical regression Explain why we might use hierarchical regression over simple multiple

Simple/multiple give us model fit and R squared which accounts for all predictors in our model Simple/multiple is a simultaneous model Hierarchical models – some strategy (or specified hierarchy) which is dictated in advance by purpose/logic of research Allows us to be more theory driven Allows us for adjust for variables Partitions our explained variance

Define and explain steps in a hierarchical regression

Groups of variables that we want to look at as a distinct set of predictors Often people have variables they want to adjust for in one step e.g age and gender May want to put questionnaire measures in one step and behavioural measures in another Known predictors before hypothesised predictors Fine to tabulate full regression analysis but report final model (including all steps) in the text You should also describe the set up of the regression model (i.e which variables were entered on each step and why)

Define and explain adjusted r^2 change

This tells us how much adjusted R^2 (ie the amount of variance in the outcome predicted by the model) changes with the addition of a new step after controlling for each previous step SPSS calculates an F-change, simply an ANOVA F value telling you whether the step (R^2 change) predicts a significant amount of variance after controlling for previous steps These should be reported

Explain why outliers are an issue and how they can be identified

Outliers/influential cases can have considerable effect on our regression parameters Outlier is not condition: not conditional on any other variable Only on dependent variable Can check outliers using box plots

Define leverage and explain how it works

Tells us about extreme data points on the X variable Distance between Xi and all other X data points Between 0 and 1 (lower = better) Sum of the leverage values = number of predictors in the model Cut offs: 3, 3 times number of parameters divided by number of data points

Define and explain Cook’s distance

If any outcome variables have high residuals, they may distort accuracy Tells us how predicted Y values will move on average if the data point is removed (a lot of change indicates a large amount of influence) As with variance inflation factor, no accepted cut off – MANY DIFFERENT E.g \>1, 4/number of participants, 3 times larger than the mean

Define DFBETAs and give the cut-offs

The difference in the regression co-efficient with the data point included-excluded Expressed as change in standard deviation Cut offs: 2/SQRT(n) Value = 1 If cooks’ distance on exam: 4/N-K-1) (k= NUMBER OF INDEPENDENT VARIABLES) If DFBETAs: 2/SQRT(n)

Explain the SPSS output for hierarchical regression

Unstandardized B (and confidence intervals) – how much does BMI increase for every 1 unit increase in our IV Standardised B – allows us to make comparisons across predictors R2 change is used to assess how much (more) variance is explained at each step. F-change tells us whether the amount of extra variance explained is significant.

Give an example APA style writeup for hierarchical regression

A hierarchical linear regression was conducted to example the effects of age, gender, self-control, eating restraint and childhood attachment on BMI. Age and gender were added to step one of the model, and self-control, eating restraint and childhood attachment added to step two. Variance Inflation Factors suggest multicollinearity was not a concern. The final regression model was significant and explained 54 % of variance (F(5, 94) = 24.60, p \< .001) Age was a significant positive predictor of BMI and self-control was a significant negative predictor. Gender, Eating Restraint, and childhood attachment were not significant predictors (see table…)

Describe reliabilty

• Refers to the consistency of a measure – it is essentially about whether a measure is consistent. • Psychologists consider three types of reliability: 1. Over time (test-retest reliability) 2. Across items (internal consistency) 3. Across different researchers (inter-rater reliability) • Reliability measures commonly take the form of correlation coefficients but there are different methods available.

Research methods and statistics 3 (year two) Flashcards

(36 cards)