Research methods and statistics 3 (year two) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Explain and define P-Hacking

A

Method of manipulating data to get significant results

Multiple analyses

Omitting info

Controlling for variables

Analyse partway through then collect more data

Changing DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain how outliers can be an issue

A

Outliers in small sample sizes can be the difference between a significant and non-significant result

Non-parametric correlations can combat this (e.g original test is pearson correlation, NP is spearman’s rho)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define regression and what it tells us

A

A test of if two or more variables can predict variance in an outcome variable

E.g clinical psychologist may want to know what variables are associated with psychosis symptoms

Tells us:

If model is a good fit

If there are significant relationships between a predictor variable and an outcome variable

The direction of the relationships

Can then make predictions beyond our data

Predicts a line of best fit for association between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the linear regression equation

A

Yi= (B0+B1Xi) + ei

Yi = outcome/variable you’re predicting

B0 = intercept, constant – mean value of outcome variable if the predictor in model is 0. Positions line at intercept

B1Xi = predictor variable, tells you the shape of line of best fit (also called parameter estimate)

Ei = error term, amount of variance left over in model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define beta slope

A

Slope aka beta: number of units change in the dependent variable for every 1 unit change in the IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give the assumptions for regression

A

Normally distributed continuous outcome

Independent data

Ratio/interval predictors

Nominal predictors with two categories (dichotomous)

No multicollinearity for multiple regression

Careful of influencing cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give the parameters needed to work out how well the regression model fits the data

A

To work out how well the model fits the data we need to know:

Sum of squares total (SST)

Used to generate test statistic – ideally as high as possible

Proportion of improvement due to model

Sum of squares residual (SSR)

Sum of squares model (SSM)

SST uses difference between observed data and mean value of outcome

SSR uses difference between observed data and regression line

SSM uses difference between the mean value of Y and the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give the equation and components for generating a regression test statistic

A

Test statistic tells us the ratio of explained vs unexplained variable in the outcome

F test (Model fit) = MSm

Msr

MSm = means of the squares of the model

MSr = means of the squares of the residual

F test tells us if it is a good fit of the data – are we explaining variance?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define proportion of total variation and give the equation

A

Proportion of total variation (SST) that is explained by regression (SSR) is known as the coefficient of determination and referred to as R^2

R^2 = SSR

SST

R2 can vary between 0 and 1 and often expressed as %

R2 is not that useful if you have more than one predictor variable – more than one = r2 adjusted

Adjusted r2 = how effective the model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain when multiple regression is needed

A

Two or more variables to predict our outcome

To improve explanatory potential – examine which predictors are statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give the equation for multiple regression

A

Yi= (B0+B1X1i+B2X2i) + ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the spss output for simple regression

A

Variables entered/removed allows you to double check the info you put in

Model summary: gives R2 statistic – always report adjusted R square

ANOVA: tells us about our model fit (is model a better fit than just using the mean) – F-test

Coefficients: tells us about the individual predictors in our model – whether they are significant and their direction (unstandardized coefficients)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example APA style writeup for simple regression

A

A simple regression was carried out to investigate the relationship ——- and ——. The regression model was significant and predicted approximately % of variance (adjusted R2 = .-;F(X,Y) = -, P=-). ——– was a significant/insignificant predictor of ——– (b=.-(s.e=.-); % - to -; t=- p=-)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define multicollinearity

A

Multicollinearity: occurs when independent variables in a regression model are highly correlated

If two/more predictor variables in model are highly correlated with each other they do not provide unique/independent info to the model

Can adversely affect regression estimates

Large amounts of variance explained but no significant predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how to identify multicollinearity

A

Identifying multi-collinearity

Look for high correlations between variables in a correlation matrix ( r>.8)

R = 1 is perfect MC – data issue

Tolerance statistic

Percentage of variance in IV not accounted for by other IVs

1 – R2

High tolerance = low multicollinearity

Low tolerance = high multicollinearity

Variance inflation factor

1/tolerance

Indicates how much the standard error will be inflated by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give ways of fixing multi-collinearity issues

A

Fixing multi-collinearity issues

Increase sample sixe

Remove redundant variable

If two or more variables are important, create a variable that takes both of them into account

17
Q

Give an example AQA style writeup for multiple regressions

A

A multiple regression was conducted to investigate the roles of –, – and – on —. The regression model was — and predicted -% of variance (adjusted R2=-:F(-,-) = -, P=-). Variance inflation factors suggest multicollinearity was not/ was a concern (–=-, – =, –=-). – was a significant/non-significant predictor of — (b=- (s.e = -); % Ci – to -; t=-, p=-) and – was not a significant/was significant predictor (b=- (s.e = -); % Ci – to -; t=-, p=-)

18
Q
  1. Explain what a mediator is
A
  • Links two variables
  • Mediator is a variable that is affected by the IV, mediator influences DF
  • Effect of the IV on the DV (IV-DV) is partially dependent on the mediator (IV-M-DV)
  • IV-DV = direct effect ( c )
  • IV-M = a-path
  • M-DV = b-path
  • Full mediation: inclusion of mediator renders direct IV-DV effect non-significant
  • Partial mediation: inclusion of mediator renders direct IV-DV effect less significant
19
Q
  1. Explain the difference between mediation and moderation
A

Mediation: a variable that accounts for an association between a predictor and a DV
- Moderator: affects the strength of a relationship between a predictor/DV
o Moderator does not have to be associated with the IV or DV
- Mediator MUST be something that can change, e.g age cannot be a mediator, craving can
o IV has to influence mediator, nothing can influence age

20
Q
  1. Give some issues of the causal steps approach
A
  • Has little or no sensitivity at all (needs huge sample)
  • Mathematically incorrect (IV DV association not necessary)
  • Is unable to detect suppression effects
21
Q
  1. Explain why the Sobel test is a bad solution
A

Gives a p value for the indirect effect

  • Based upon a product of the coefficients calculation
  • Assumes the product of the coefficient is normally distributed – this is almost never the case
  • This method also requires more participants to detect indirect effects than the methods used today
22
Q
  1. Explain why the joint significance test is a good solution to mediation
A

This method ignores the IV-DV association (doesn’t have to be significant)
o If a path and b path are significant there is evidence of mediation
- Also give confidence intervals for the for indirect effect

23
Q
  1. Explain how to run an SPSS analysis for mediation
A

example: IV = personality disorder, M= enhancement, DV = alcohol units consumed
- Firstly we produce two regressions
1. IV-M Personality disorder to enhancement
2. M-DV enhancement (+personality disorder) to alcohol units consumed
- We control for personality disorder in the second regression so we can be sure that the mediator is predicting variance beyond that accounted by the IV
- Analyse – regression – linear
- Regression 1: IV to M
- To do our additional test for mediation we need to take the unstandardized regression coefficient and its standard error and use it in Remediation (we can use three dp when using the program)
- Then run the second regression M-DV but controlling for PDQ-4 (personality disorder)
o Enhancement (and PDQ-4) to Units consumed
o If enhancement is significant in this regression then there is evidence of mediation

24
Q
  1. Explain how to writeup mediation in APA format
A

The first regression IV-M
o The regression is significant (R² adjusted = 0.15, F(1,225)=41.32, p

25
Q
  1. Explain how to use confidence intervals
A

95% confidence intervals reflects how confident we can be in our regression coefficient, it expresses the precision of our estimate, 95% of samples from this population will fall in this range (if we give 95% CI’s we could give 99% CI’s etc)

  • High precision= “tighter” CI, this is a good thing it shows consistency in an effect.
  • If it overlaps with 0 this means there will not be a significant effect as the range of predicted values overlaps with no effect (0= no change)
  • If they don’t overlap you have a significant effect p
26
Q

Explain the function of some spss values given in regression

A

Model fit: F test

Amount of variance explained: adjusted R^2

Significance of individual predictors - betas

27
Q

Explain the difference between simple multiple and hierarchical regression

Explain why we might use hierarchical regression over simple multiple

A

Simple/multiple give us model fit and R squared which accounts for all predictors in our model

Simple/multiple is a simultaneous model

Hierarchical models – some strategy (or specified hierarchy) which is dictated in advance by purpose/logic of research

Allows us to be more theory driven

Allows us for adjust for variables

Partitions our explained variance

28
Q

Define and explain steps in a hierarchical regression

A

Groups of variables that we want to look at as a distinct set of predictors

Often people have variables they want to adjust for in one step e.g age and gender

May want to put questionnaire measures in one step and behavioural measures in another

Known predictors before hypothesised predictors

Fine to tabulate full regression analysis but report final model (including all steps) in the text

You should also describe the set up of the regression model (i.e which variables were entered on each step and why)

29
Q

Define and explain adjusted r^2 change

A

This tells us how much adjusted R^2 (ie the amount of variance in the outcome predicted by the model) changes with the addition of a new step after controlling for each previous step

SPSS calculates an F-change, simply an ANOVA F value telling you whether the step (R^2 change) predicts a significant amount of variance after controlling for previous steps

These should be reported

30
Q

Explain why outliers are an issue and how they can be identified

A

Outliers/influential cases can have considerable effect on our regression parameters

Outlier is not condition: not conditional on any other variable

Only on dependent variable

Can check outliers using box plots

31
Q

Define leverage and explain how it works

A

Tells us about extreme data points on the X variable

Distance between Xi and all other X data points

Between 0 and 1 (lower = better)

Sum of the leverage values = number of predictors in the model

Cut offs: 3, 3 times number of parameters divided by number of data points

32
Q

Define and explain Cook’s distance

A

If any outcome variables have high residuals, they may distort accuracy

Tells us how predicted Y values will move on average if the data point is removed (a lot of change indicates a large amount of influence)

As with variance inflation factor, no accepted cut off – MANY DIFFERENT

E.g >1, 4/number of participants, 3 times larger than the mean

33
Q

Define DFBETAs and give the cut-offs

A

The difference in the regression co-efficient with the data point included-excluded

Expressed as change in standard deviation

Cut offs:

2/SQRT(n)

Value = 1

If cooks’ distance on exam: 4/N-K-1) (k= NUMBER OF INDEPENDENT VARIABLES)

If DFBETAs: 2/SQRT(n)

34
Q

Explain the SPSS output for hierarchical regression

A

Unstandardized B (and confidence intervals) – how much does BMI increase for every 1 unit increase in our IV

Standardised B – allows us to make comparisons across predictors

R2 change is used to assess how much (more) variance is explained at each step.

F-change tells us whether the amount of extra variance explained is significant.

35
Q

Give an example APA style writeup for hierarchical regression

A

A hierarchical linear regression was conducted to example the effects of age, gender, self-control, eating restraint and childhood attachment on BMI. Age and gender were added to step one of the model, and self-control, eating restraint and childhood attachment added to step two. Variance Inflation Factors suggest multicollinearity was not a concern. The final regression model was significant and explained 54 % of variance (F(5, 94) = 24.60, p < .001) Age was a significant positive predictor of BMI and self-control was a significant negative predictor. Gender, Eating Restraint, and childhood attachment were not significant predictors (see table…)

36
Q

Describe reliabilty

A

• Refers to the consistency of a measure – it is essentially about whether a measure is consistent. • Psychologists consider three types of reliability: 1. Over time (test-retest reliability) 2. Across items (internal consistency) 3. Across different researchers (inter-rater reliability) • Reliability measures commonly take the form of correlation coefficients but there are different methods available.