Stats Flashcards
Frequency
Number of times each score occurs.
Normal distribution
Most scores gravitate towards the mean score with few deviating outliers.
Recognisable by: bell-shaped curve
Positively skewed distribution
Frequent scores are clustered at the lower end.
Recognisable by: a slide down to the right.
Negatively skewed distribution
Frequent scores are clustered at the higher end.
Recognisable by: a slide down to the left.
Platykurtic distribution
A wider spread of high scores.
Recognisable by: thick “platypus” tail that’s low and flat in the graph.
Leptokurtic distribution
High scores are extremely centralised and obviously close to the mean.
Recognisable by: skyscraper appearance - long and pointy.
Mode
Most common score.
If there are 2 equally common scores it is bimodal and there can be no mode.
Can use mode for nominal data.
Disadvantages of mode
1) Could be bimodal (or multimodal) and give no true mode (e.g. 3/10 and 7/10 are opposites but bimodal).
2) A mode can be changed dramatically if one single case is added.
Median
Central point of scores in ascending data. Middle number in odd number of cases. If even - mean of 2 central numbers.
+ Relatively unaffected by outliers
+ Less affected by skewed distribution
+ Ordinal, interval, and ratio data
- Can’t use on nominal
- Susceptible to sampling fluctuation
- Not mathematically useful
Mean
Add all scores and divide by total number of scores collected.
\+ Good for scores grouped around central \+ Interval and ratio data \+ Uses every score \+ Can be used algebraically \+ Resistant to sampling variation - accurate
- Outliers that are extreme
- Affected by skewed distributions
- Not used for nominal and ordinal
Range
Subtract smallest score from largest.
+ Good for cluster scores
+ Useful measure of variability for nominal & ordinal data
Symbols
x = score x̅ = mean x-x̅ = deviation (d) ∑ = sum N = number in a sample s² = variance of a sample s = standard deviation
Accuracy of the mean
Hypothetical value - doesn’t translate to real values (i.e. 2.5 children)
1) Standard deviation
2) Sum of squares
3) Variance
Total error
Worked out by adding all the deviations.
Deviation
Observed value - mean
Negative score = overestimate for this participant
Positive score = underestimate
Sum of squared errors (SS)
Square all deviations so they become positive.
Add them together to make a sum of squares.
The higher the sum of squares, the more variance in the data.
More variance = less reliable
Standard deviation (σ)
A measure of spread - is it statistically significant or expected variance?
Anything within standard deviation value from mean would be expected variance. Anything outside 2 standard deviations is statistically significant.
Same as SS / x̅. Square root the result.
Sampling distribution
The frequency distribution of sample means from the same population.
Standard error of the mean (SE)
The accuracy with which a sample reflects the population. Measured by deviation from the mean.
Large value = different from the population
Small value = reflective of population
Confidence interval
If we can assess the accuracy of sample means, we can calculate the boundaries within which most sample means will fall. This is the confidence interval.
If it represents the data well, the confidence interval of that mean should be small.
Descriptive statistics
Shows what is happening in a given sample.
Inferential statistics
Allows us to make assumptions based on the information we have analysed.
At what probability value can we accept a hypothesis and reject a null hypothesis?
0.05 or less.
Type 1 error
When we believe our experimental manipulation has been successful when it is actually due to random errors. E.g if we are accepting 5% as significance value and repeated the experiment 100 times, we would still have 5 times we get statistical significance that is due to random error.
Type 2 error
Accepting the difference found was due to random errors when it was actually due to the independent variable.
Effect size
An objective and standardised measure of the magnitude of the observed effect. As it is standardised, we can compare effect sizes across different studies.
Pearson’s correlation coefficient (r)
Measures the strength of a correlation between 2 variables. Also a versatile measure of the strength of an experimental effect. How big are the differences (the effect)?
0 = no effect 1 = perfect effect
Cohen’s guidelines to effect size
r = 0.10 (small effect): explains 1% of total variance.
r = 0.30 (medium effect): 9%
r = 0.50 (large effect): 25%
N.B. Not a linear scale (i.e. .6 is not double .3)
Why use effect size?
To show the level of significance of the effect that we are observing - how significant is the [p < .05] significance?
Is not affected by sample size in the way that p is.
Properties linked to effect size:
1) Sample size on which the sample effect size is based.
2) The probability level at which we will accept an effect as being statistically significant.
3) The power of the test to detect an effect of that size.
Statistical power
The probability that a given test will find an effect, assuming one exists in the population. Can be done before a test to reduce Type II error.
Must be 80% or higher.
What assumptions need to be met for parametric test
1) Data measured at interval or ratio level.
2) Homogeneity of variance (Levene’s test)
3) Sphericity assumption (Mauchley’s test)
4) Sample should form a normal distribution.
Checks for normal distribution
1) Plot a histogram to see if data is symmetrical.
2) Kolmogorov-Smirnov test or Shapiro-Wilk test.
3) Mean and median should be less than half a standard deviation different.
4) Kurtosis and skew figures should be less than 2x their standard error figure.
Kolmogorov-Smirnov or Shapiro-Wilk
Compares your set of scores with a normally distributed set (with the same mean and standard deviation)
We do not want our data to be significantly different from the normal set if p< 0.05
Homogeneity variance
Individual scores in samples vary from the mean in a similar way.
Tested using Levene’s test.
Levene’s test
Measures homogeneity variance in order to tell if the individual scores on the samples vary from the mean in a similar way.
An assumption for a parametric test.
If unequal group sizes, we must run additional tests: Brown-Forsythe F and Welch’s F adjustments.
T-test
The difference between means as a function of a degree to which those means would differ by chance alone.
Independent t-test
An experiment on two groups in a between subjects test to see if the difference in means is statistically significant.
Degrees of freedom
The number of independent values or quantities, which can be assigned to a statistical distribution.
N - 1
Dependent t-test
An experiment on two groups in a within subjects test to see if the difference in means is statistically significant.
Pearson’s correlation follows to measure effect size.
Effect of sample size
Smaller samples mean the t-test would have to detect a bigger effect.
Larger samples means the effect would only have to be small.
ANOVA
A series of t-tests run to analyse 3 or more levels of the independent variable.
Explores data from between groups studies.
Done instead of a chain of t-tests to reduce a Type I error.
F-ratio
Test statistic produced by ANOVA. Tells us if the means of three or more samples are equal or not.
Compares systematic variance in the data (SSm) to the amount of unsystematic variance (SSr).
Orthogonal
A planned ANOVA contrast used when there is a control group.
Non-orthogonal
A planned ANOVA contrast when there is no control group.
Follow-up tests
Done after an ANOVA to see where the difference lies.
Planned comparisons
When specific predictions about which group means should differ, before data is collected, are made.
Post-hoc tests
Done after the data has been collected and inspected. More cautious and generally easier to do.
One-way independent ANOVA
Used when you’re going to test 3 or more experimental groups and different participants are used in each group (between subjects design).
When one independent variable is manipulated.
One-way repeated measures ANOVA
Used when you’re going to test 3 or more experimental groups and the same participants are used in each group (within subjects design).
When one independent variable is manipulated.
Two-way ANOVA
Analysing 3 or more samples with 2 independent variables.
Three-way ANOVA
When 3 or more samples are tested with 3 independent variables.
MANOVA
An ANOVA (an analysis of three or more samples) with multiple dependent variables. It tests for two or more vectors of means.
Choosing Tukey HSD as post-hoc
Equal sample sizes and confident in homogeneity of variances assumption being met.
Most commonly used - conservative but good power.
Choosing Bonferroni as post-hoc
For guaranteed control over the Type I error rate.
Very conservative so not as popular as Tukey but still favoured amongst researchers.
Choosing Gabriel’s as post-hoc
If sample sizes across groups are slightly different.
Choosing Hochberg’s GT2 as post-hoc
If sample sizes are greatly different.
Choosing Games-Howell as post-hoc
If any doubt whether the population variances are equal.
Recommended to be run with other tests anyway because of uncertainty of variances.
Choosing REGWQ as post-hoc
Useful for 4+ groups being analysed.
Interpreting ANOVA
Degrees of freedom of groups, then degrees of freedom of residuals of the model (participants minus groups). Followed by F value, then significance.
F(5, 114) = 267, p < .001
Sphericity
We assume that the relationship between one pair of conditions is similar to the relationship between another pair of conditions.
Measured by Mauchly’s test.
The effect of violating sphericity is a loss of power (increases chance of Type II error).
Variance
How far a value is spread out from the average.
Mauchly’s rest
Used to measure sphericity.
If it is significant we conclude that some pairs of conditions are more related than others and the conditions of sphericity have not been met.
If sphericity is violated
Corrections can be made using:
1) Greenhouse-Geisser
2) Huynh-Feldt
3) The lower-bound
Two-way mixed ANOVA
Measuring one independent variable with between groups, and one independent variable within groups.
Two overall independent variables.
Main effects
The effect of an independent variable on a dependent variable, ignoring the other independent variables.
Significant main effect of group (generally between subjects)
There are significant differences between groups.
Significant main effect of time (generally within subject)
There are significant differences between repeated measures samples.
A significant interaction effect
There are significant differences between groups (independent measures) and over time (within subjects).
So the change in scores over time is different dependent on group membership.
Cohen’s kappa
Correlation coefficient for qualitative inter-rater data.
Two-way repeated measures ANOVA
Two independent variables measured using the same participants. Repeated measures.
ANCOVA
ANOVA, but to see if there is also an investigation into the effect of covariates. We assume that our covariates have some effect on the dependent variable.
We assume the covariate has a correlation with the dependent variable in all groups.
Reduces error variance.
Covariates
Variables that we already know will influence the study (e.g. age, memory, etc.).
ANCOVA assumptions
Same parametric assumption as all the basic tests, plus the additional homogeneity of regression.
MANCOVA
An ANCOVA (an analysis of three or more samples) with multiple dependent variables affected by covariates. It tests for two or more vectors of means.
Non-parametric tests
Also known as assumption-free tests. We would use these tests if the typical assumptions aren’t met.
Transforms raw scores into ordinal data so assumptions are not needed. Analysis is performed on ranks.
Uses medians instead of means.
Power of non-parametric tests
Reduced, leading to increased chance of Type II error.
Mann-Whitney
Non-parametric.
Equivalent of independent measures t-test (between-subjects).
A good way to show non-parametric tests
Using a box-whisker plot.
Box-whisker plot
A shaded box represents the range between which 50% of the data fall with lines either side. A.k.a. Interquartile range.
Horizontal bar is the median. Each whisker reaches to highest and lowest value.
The eye shape allows us to see the limits in which most or all of the data fall.
Reporting Mann-Whitney
E.g.
Men (Mdn = 27) and dogs (Mdn = 24) did not significantly differ in the extent to which they displayed dog-like behaviours, U = 194.5, ns, Z=-.15.
U = significance Z = displays how close to the mean in standard deviations (0 is close)
Wilcoxon signed rank test
Non-parametric equivalent of dependent t-test (within-subjects studies).
Reporting Wilcoxon
E.g.
Men (Mdn = 27) and dogs (Mdn = 24) did not significantly differ in the extent to which they displayed dog-like behaviours, T = 111.50, p
Kruskal-Wallis test
Non-parametric equivalent of one-way independent ANOVA (between-subjects).
Following up non-parametric tests
None are very commonly used.
We can use Mann-Whitney follow-ups for each pair of IV groups.
Must consider using a Bonferroni correction to reduce Type I error.
Bonferroni correction
Used for Mann-Whitney tests. Divide critical value (.05) by number of tests carried out.
E.g. an ANOVA would mean 3 tests so p < 0.0167.
Reporting Kruskal-Wallis
Children’s fear beliefs about clowns were significantly affected by the format of info given (H(3) = 17.06, p < .01).
H = test statistic (3) = degrees of freedom
Chi-squared (χ2)
For nominal data correlation.
Friedman’s ANOVA
Non-parametric one-way dependent ANOVA (repeated measures).
Follow-ups with Wilcoxon tests, with the same Bonferroni corrections.
Reporting Friedman’s ANOVA
Children’s fear beliefs about clowns were significantly affected by the format of info given (χ2(2) = 7.59, p < .05).
χ2 = test statistic (2) = degrees of freedom
χ2 test
Used when nominal data is between-subjects.
Assumptions of χ2 (chi-squared)
1) Must be between-subjects.
2) Frequencies must be large.
Binomial sign test
Nominal data but within-subjects. DV has 2 possible values: yes or no.
Choosing a test - 5 questions
1) What kind of data will I collect?
2) How many IVs will I use?
3) What kind of design will I use?
4) Independent measures or repeated?
5) Is my data parametric or non?
Spearman’s rho
Non-parametric equivalent to Pearson’s r.