Statistics Flashcards
NOTE TO SELF SORT OUT STATS FLOW CHART
What is the mean?
Add all the values together and divide by the number of values.
What is the median?
The middle value (Average middle 2 if odd number of values)
What is the mode?
The most frequently observed value
What is the standard deviation?
Estimate of the average variability (spread) of the data
What is the 95% confidence interval?
The range of values around that statistics (so 95% of values will fit into between the range of values)
What is standard error?
The standard deviation of the sampling distribution of a given statistic.
What is the range?
The range in values (Maximum and minimum)
What are the basic statistics?
The mean, mode, range, median, 95% confidence interval and standard deviatio.
How do you find the basic statistics using SPSS?
1) Enter data into SPSS
2) Analyse
3) Descriptive statistics
4) Explore
5) Add the variable into the dependant list
6) OK
YOU MUST HAVE ALL DATA IN SEPARATE COLUMNS (i.e. do not stack)
What is P?
The p value is the probability of falsely rejecting the null hypothesis
What is the null hypothesis?
The null hypothesis is that there is no difference
Why do we use statistics?
To test our data against the null hypothesis
What does p < 0.05 mean
p < 0.05 is described as significant.
Essentially if its less than 0.05 you are 99.95% sure there is a significant difference!
Example of hypothesis and null hypothesis?
Hypothesis: Paracetamol will reduce symptoms of a headache
Null hypothesis: Paracetamol may reduce symptoms of a headache
What is normal distribution?
Normal distribution is:
- Ensures that the data is representative of a “normal population”
- Important for quantitative data
- Helps determine how we present data
- Helps determine what statistical test we use to analyse data
What are the tests used for normal distribution?
To see whether data is normally distributed use;
- Kolmogorov - Smirnov test
OR
- Shapiro - Wilk test
When do you use parametric or non parametric test?
P value is < 0.05 = NOT normally distributed = non parametric test
P value is > 0.05 = normally distributed = parametric test
How can you test for normality i.e a Kolmogorov Smirnov test or Shapiro Wilko test?
You can do this at the same time as basic statistics in SPSS. YOU MUST HAVE ALL DATA IN SEPERATE COLUMNS (i.e. do not stack)
1) Enter data into SPSS
2) Analyse
3) Descriptive Statistics
4) Explore - Add the variable into the dependant list
5) Click plots
6) Click normality plot with tests
7) Histograms
8) Continue
What would happen if one was significant and the other wasn’t?
It is not normally distributed they both need to be normally distributed in order to use the parametric test.
What is a type 1 error?
Occurs when we believe that there is a genuine effect in our population, when in fact there isn’t
What is a type 2 error?
Occurs when we believe there is no effect in the population, when in reality there is.
What is one tailed?
A test of directional hypothesis i.e. x will predict y or x is different from y
What is two tailed?
A test of non directional hypothesis i.e. x may or may not predict y or x is maybe different from y
What do we do with parametric data?
- Use when data is normally distributed
When comparing 2 groups:
- Independent T test - two different experimental conditions/ groups
- Dependant T test - two groups that are dependant on each other
Comparing 3 or more groups:
One way ANOVA - You have 3 or more groups
How can you determine if your data is dependant or independent?
Independent = Compare participants in different group E.G. Different people receiving different treatments.
Dependant = Compare participants in the same group E.G. Same people tested twice receiving the same treatment
What to do with 2 groups and data that is normally distributed?
2 groups and data that is normally distributed
Dependant on each other OR Independent groups
Dependant T test Independent T test
How do you do a dependant T test?
1) Analyse
2) Compare means
3) Paired samples t test
Then add the data from test 1 under variable 1 and data for test 2 under variable 2 then press ok
How do you do independent T test?
Stack variables in one column. Then add group numbers (e.g. 1 and 2) into the next column and define groups
1) Analyse
2) compare means
3) Independant samples T test
Then add the column of stacked data into the test variable box and the groups column into the grouping variable box.
Define groups and press ok
Also check levene’s test, are equal variances assumed?
What does sig .000 mean?
P value < 0.0001
What is levene’s test?
Tests whether groups are similar
What does ANOVA tell us?
ANOVA (Analysis of variance) which tell us if there is a difference between all groups
But an ANOVA DOES NOT tell us where the difference is between groups, e.g. is group A different from group B and C or just group B
We use Post - hoc tests to compare each group to work out which is different from each other
What do we for more then 3 groups?
If 3+ groups and data is normally distributed?
Enter data into SPSS - Analyse - compare means - One Way ANOVA - Select your post hoc test (LSD and Tamhane’s) - click options - choose homogeneity of variance test
Check ANOVA sig/ P value is it < 0.05
Yes | No
Check homogeneity of variance test is that significant? | Just report
ANOVA, no need for further
analysis
No = Use LSD Post Hoc results
Yes = Use Tamhane Post Hoc results
MUST GROUP DATA E.G. ADD VARIABLE 1 TO TEST VARIABLE BOX AND GROUPS/VARIABLE 2 INTO GROUPING VARIABLE BOX.
CHECK DESCRIPTIVES TO GET BASIC STATS TOO
How does ANOVA work?
ANOVA is < 0.05, check homogeneity of variance test
We use LSD post hoc as homogeneity of variance test > 0.05 if (<0.05 use tamhanes)
What is ANCOVA
Compares several means but adjusts for the effects of one or more other variables
What is repeated measures ANOVA
ANOVA conducted when any independent variable or variables have been measured from the same participants in all conditions
What is factorial ANOVA?
2 or more independent variables or predictors
Tell me about non - parametric data?
- Use when data is NOT normally distributed
Comparing 2 groups:
Mann Whitney test - 2 different experimental conditions/groups, a non - parametric version of an independent test
Wilcoxon Signed rank test - 2 groups that are dependent on each other, a non parametric version of a dependent t test.
Comparing 3 or more groups:
- Kruskal Wallis Test - You have three or more groups , a non - parametric version of a One - way ANOVA
How to do a Wilcoxon signed rank t test?
1) Analyse
2) non parametric tests
3) related samples
Under fields tab transfer your parameters to the test field box
Go to settings and choose Wilcoxon matched pair signed rank test (2 samples)”
Press run
How to do a Mann Whitney T test?
Stack variables in one column. Then add group numbers into the next column e.g. 1 and 2. They define groups
1) Analyse
2) Non parametric test
3) Independent samples
4) Automatically compare distribution across groups.
Then add the column of stacked data into the Test Variable box and the groups column into the grouping variable box
Settings - Customize tests - Select Mann Whitney U (2 samples) - Run
How to do a Kruskal Wallis test?
3 + groups and data is NOT normally distributed
1) Enter data into SPSS
2) Rearrange data like you would for One - way ANOVA
3) Analyse
4) Non parametric test
5) Independent samples
6) Automatically compare distributions across groups
7) Add BMD/variable 1 into test variable box and groups/variable 2 into groups box
Go to settings customize tests and Kruskal Wallis and run
Check sig p value, is it <0.05
If yes, Run multiple Mann Whitney tests between each group and report the p values and determine which group shows significance
If no, just report the p value no need for further analysis.
How to run a post hoc analysis?
To run a post hoc analysis (a bit like we did with a one way ANOVA) we need to run a individual Mann Whittney Tests which is a bit time consuming
N.B. To do this you only need to run 2 groups at a time it will not work otherwise
What is correlation?
Correlation is measuring the relationship between 2 variables.
Positive = one increases the other increases
Negative - As one increases the other decreases.
There are 2 types of tests:
For parametric data = Pearson’s correlation coefficient
For non parametric data = Spearman’s correlation coefficient
What is correlation coefficient score?
A correlation coefficient score has to lie between + 1 and - 1
A + 1 = perfect positive
a -1 = perfect negative
0 = no relationship
Correlation grading
0.0 - 0.3 = Poor correlation
0.3 - 0.5 = Low correlation
0.5 - 0.7 = Moderate correlation
0.7 - 0.9 = Strong correlation
0.9 - 1 = Very strong correlation
can refer to both positive and negative
How to do pearson?
Is your data normally distributed
Yes = pearson
Parametric data = normally distributed
Enter data next to each other
Analyse - correlate - bivariate - tick pearson for parametric data
How to do spearmans?
Is your data normally distributed
No = Spearman
Non - parametric - not normally distributed
Enter data next to each other
Analyse - correlate - Bivariate - Tick spearman for non parametric data
How do you report the correlation?
You should report the Pearson correlation coefficient as r and the p value
For example:
r = 0.796 (p<0.001) indicates a strong positive correlation
You should report the spearman correlation coefficient as p and the p value
For example:
p = 0.796 (p<0.001) indicates a strong correlation
What is regression?
Simple regression = Predict an outcome variable from one predictor variable
Multiple regression = Predict an outcome from several types of predictor variables
Binary logistic regression = Predict a categorial outcome from several types of predictor variables.
Simple regression?
Dependent = what you are trying to predict
Independent = What you are using to try and predict the variable
Add data into SPSS next to each other - Analyse - Regression - Linear
R = your Pearson’s correlation
R square = Look a it telling you how much one variable accounts for the other e.g. 0.948 = 94.8%
Sig level = <0.001 = does significantly predict
Multiple regression = aim for n = 5 - 10 people
It tells us which variable predict the outcome
What is nominal data?
Nominal ( Categories wiht no obvious order) : Gender, smoker or non - smoker, eye colour
What is ordinal data?
Ordinal (Categories that can be ranked in order) : Shoe size, Age categories
Why do we use Pearson’s chi squared test?
To look at the relationship between two categorical variables
Chi squared will test the null hypothesis, i.e. that they are similar