SPSS TIPS Flashcards by Nina de Goede

DATA FILES

Always ensure variables are clearly named and labelled. (names are up to 8 characters only for PROCESS), so may as well stick to this throughout spss.
Always ensure value labels have been added.
Always ensure computed/recoded variables are clearly defined.
Always make sure the variables and labels etc are easy for you to work with, and remeber others in your absence need to be able to work with the file also.

How well did you know this?

Not at all

Perfectly

NOTEPAD

There is a notepad fn under
>Utilities
>Data file comments
It is helpful to add notes here re what have done to your file and save it. It will get date stamped and saved as part of the data file. Helpful for you and others to keep track.
Make sure to TICK for > Display comments in output.

How well did you know this?

Not at all

Perfectly

DATA FILE ORGANISATION

Name variable by scale and subscale for ease of identification. Be as specific as can be for variable names. Make sure you can readily recognise which variables you need without having to refer to your Q all the time.
eg. Depression Anxiety Stress Scales (DASS). Label questions according to both questionaire and subscale (each q will be a variable)
egDASS_1_5,DASS_2_A
eg DASS_3_D,DASS_4_A etc.
D=Depression subscale, 3 refers to q 3 etc. Use undersores as cannot have spaces in variable names.

How well did you know this?

Not at all

Perfectly

MISSING VALUES CODE

In the Missing box, it will default to none. But if you want to label missing data as a value, can change this by
> Discrete Missing Values. eg type in 97, 98, 99.
(The other option to label lissing data is Range, but this is only used if labelling a large amount of missing data, which hopefully you never have)
Dr J finds it handy to use codes 97=illegible answer
98=skipped answer or Not applicable
99= did not answer.

How well did you know this?

Not at all

Perfectly

RECODING

If recode into > Same Variable, it overwrites the old variable, whereas if REcode into> Different Variable, it creates a new variable and keeps the original (safer but either is fine as long as know what are doing).
> Transform
> Recodeinto Different Variables
Note Input Variable is your original variable. Output variable is the new variable you are creating. Give it a meaningful name.
eg Recodings;
1. Reverse scoring. Single code for single replacement
2. Collapsing of categories ie make use of Range options to note a range of scores which should be collapsed into a single score/code. eg create age bands from a continuous age variable. eg. create a never/ever variable from a behaviour frequency variable.

How well did you know this?

Not at all

Perfectly

FILTERING OUT PARTICIPANTS

filtering out participants;
>Data
>Select cases ie select cases to KEEP. Can Delete out others or just Filter out others (unselected cases). If dDelete out, SAVE data file as NEW DATA SET. A Filter allows you to remove participants but they are just hidden from view. to bring them back;
>Data
>Select cases
>select All cases.
For filtering, choosing which ones to Keep. eg the formula variable >X will keep those scores above X value and filter out those below.

How well did you know this?

Not at all

Perfectly

SYNTAX SAVING AND ANNOTATION

Anytime you run an analysis, you have option to ;
> OK or >Paste
Paste brings up syntax. Can also annotae in the syntax, just *at start of row typing and this will record your notes without thinking it is a computer command. Paste is always recommended. Then Hit Play to run the Analysis.

How well did you know this?

Not at all

Perfectly

Hierarchical Regression

The “Hierarchical module” in spss does NOT do the correct Hierachical Regression. Spss will add in 1 variable at a time or blocks of them and tell a regression analysis, but the table will have multiple slabs reflecting what happens at each block. Each block is interpreted as adding whatever is in that block to whatever was in all the preceding blocks. BUT use PROCESS MACROS TO do a hierarchical or sequential model, you put a variable, or block of variables, in first and it will provide you with up-to-date data. When you add further variables, the data updates to what is changing.

How well did you know this?

Not at all

Perfectly

PROCESS MACRO

select model 4 for simple regression.(multiple mediation)
X=predictor
Y=d/v or outcome
med1, med2,med3 etc=mediators
5000 is the default for how many re-samples the computer takes.
Under OPTIONS,
-select “to get total effect” (total effect is direct plus indirect effects)(which is c)
-tick if want to have effect size
-usually no centreing required for mediation

How well did you know this?

Not at all

Perfectly

process macro 2

How well did you know this?

Not at all

Perfectly

process macro 3

How well did you know this?

Not at all

Perfectly

process macro4

Note in this pic, the confidence intervals are both negative, and this means that zero is not included in the range, and therefore, it is significant.

How well did you know this?

Not at all

Perfectly

Process macro-putting it all together

Have gotten all the numbers from the various outputs, and can slot them into the total model.

How well did you know this?

Not at all

Perfectly

PROCESS mediation

Model number 4
> Options
tick show total effect model
tick standardised effects (gives the betas)
Do pairwise contrasts to see if 1 mediator stronger than others when have multiple mediators.

Heteroscedasticity Tick HC3 Davidson McKinon. This corrects all the standard errors associated with the coefficients. We have these errors because homoscedasticity has been breached. And all the options correct for this in some way (unless tick “none”).
> decimal outputs do 4.
MUST tick for long variables accept the risk of error. Will not run if do not accept. basically warning you re long variable names that if have same first 8 letters, computer might get them confused.
DO NOT USE PASTE IN PROCESS.

The first regression model will show the”a” path from iv to mediator. Note this then shows mediator as the output.(iv has predicted the mediator).
Then get 2nd ouput giving is direct efect of the mediatoreffect (not giving us total effect).
Direct effect is the same as “unique contribution”
Next is total efect modal, shows in total how i/v relates to dv.
Need to test significance of all pathways.Then Total, Direct, and Indirect.
Total- direct=Indirect effect.
c= c path
c’ = c prime (direct effect)
cs=completely standardised
The Indirect effect is tested via bootstrapping.
BootSE= bootstrapping standard error.
LLCI=lower confidence limit. For the Indirect effect, we are looking to see if the lower and upper confidence intervals contain zero, ie 0.6674 Lower and 2.094 Upper, DOES not contain zero, therefore it is significant.

How well did you know this?

Not at all

Perfectly

running anova in spss

Note “fixed factor”=i/v

How well did you know this?

Not at all

Perfectly

running anova in spss 2

click options.
(post hoc power tells us how much power we have once we have collected the data , as opposed to a priori which tries to determine how much power before the data is collected),

running anova in spss 3

this is the first ouput section.
Note that there were more yellow labs. Whilst not mandatory, it is often ideal to have same group numbers.

running anova in spsss4

Checking homogeneity.
Just a rough show that looks fairly homogenous on each and spread is fairly similar between groups. maybe have a couple of chocolate outliers.
Each group’s score should be independent of other groups. bear in mind that if eg had pups from same litter in different colour groups, this may mean that because they share genetic details, there may be a violation of group scores being independent.

running anova in spss5

Levene’s is NOT significant (which is good, means that the variance across groups is similar)
Levene’s is very sensitive though, so does sometimes report a significance when the spread of scores is actually very similar. (hence why set p so low)

running anova in spss6

This is using different data but just to illustrate what it might look like if Levene’s was significant and there was much different variation 1 one group.
What to do if Levene’s is significant?
-consider transforming data to fix non normality
-consider deleting ouliers (but be careful)
-consider deleting entire groups if everyone scores the same value
-maybe you can ignore it?(Anova is quite robust still in the face of a significant levene’s IFgroup sizes are equal AND the variance of the group with the highest variance is not more than 4x the group with the lowest variance AND if the shape of the distributions is similar.

runing anova in spss7

we are firstly interested in the F test for colour. ie is there a significant difference between the colour groups? Here, it is sig as p<.001. This tells us that at least 2 dog colour groups differ significantly in terms of their health score.
Partial eta squared is an effect size. says how strong the effect is. Ranges from 0-1. Closer to 1 is very strong.
The power is close to one, which is powerful.

running anova in spss8

Power= the probability of detecing a difference b/n the 3 groups if a difference actually does exist. (the probability of finding an effect under the assumption that one exists is in this example 99.6%. ie if we run the study 100 times, we would correctly reject the null hypothesis 99.6% of the time.

running anova in spss9

The F statistic only tells us if colour has an effect on the health of the dog. We have not determined which colours are different from others or which is most affected etc. To determine if black and yellow labs are significantly different,or whether chocolate and black are different, we need to do contrasts or post hoc omparisons.
Do CONTRASTS if have a specific hypothesis about which colour is higher or lower than another colour.
Do POST HOC COMPARISONS if you have no specific hypothesis re colour, but are just exploring.

Post hoc comparisons (=after the event);
we could just do seperate independent t tests to compare means of black vs chocolate, black vs yellow and chocolate vs yellow. BUT doing lots of t tests (or any statistical test) increases the chance of a type 1 error (false positive). We need to correct for the chance we will get a significant difference between the means by chance when one does not actually exist (type 1 error). Post hoc tests adjust for this possibility in various ways. (usually by adjusting p value).
in the Univariate screen,>post hoc. Put variables to test for in, ie colour in “factor”. There are 2 types;
1. equal variance is assumed (select if levene’s was not significant)
and
2. equal variance is not assumed.(select if levene’s was significant)
in our example, levene’s was not significant and we> Bonferoni as our post hoc test. Bonferoni is one of the most conservative tests.

running anova in spss10

Below is the reults from running bonferoni post hoc tests.
ie top line is chocolate compared to black, next line is chocolate compared to yellow etc.
looking at p values, difference b/n black and yellow is NOT significant. Chocolate is significantly different from both black and yellow. Chocolate labs have significantly lower heath scores than yellow or black.
The confidence intervals tell us eg if we keep sampling and keep sampling, what the true difference is. in this example, we are 95% confident that the true difference between chocolate and black would be somewhere between -17.79 and -3.3. If the confidence interval crosses or contains zero, it also means the result is not significant.

running anova in spss11

Below is how to interpret the post hoc bonferroni results.

running anova spss12

if running planned contrasts would go ANALYSIS , univariate>contrasts. eg if have hypothesis that black and yellow labradors will have significantly better health scores than chocolate. Default is "none" click on arrow to show options; 1. Deviation;compares one individual mean with the mean of the rest combined. eg black labs compared to the means of yellow and chocolate together. 2.Simple-compares all means to either the last reference category (3=yellow), or first category (1=chocolate). To select click on either last or first next to Reference category. 3. Differences compares;compares each category next to each other eg black (2)vs chocolate (1), yellow (3) vs black(2) etc 4. Helmert;compares each level of a categorical variable to the mean of the subsequent levels eg chocolate (1) vs the rest (black and yellow), black (2) vs rest (which is now just yellow). 5.Repeated;compares in order starting at the first group eg chocolate vs black, black vs yellow 6. polynomial in this eg, we chose SIMPLE. then FIRST as chocolate is our first category. then >change then>continue Below is the readout from this.

running anova spss13

notice how we don't get a contrast for black vs yellow. if want this, need to run the simple contrast again, changing the reference categoryfrom first to last ( as yellow is last).

running anova spss14