Stats Flashcards
(95 cards)
Frequency
Number of times each score occurs.
Normal distribution
Most scores gravitate towards the mean score with few deviating outliers.
Recognisable by: bell-shaped curve
Positively skewed distribution
Frequent scores are clustered at the lower end.
Recognisable by: a slide down to the right.
Negatively skewed distribution
Frequent scores are clustered at the higher end.
Recognisable by: a slide down to the left.
Platykurtic distribution
A wider spread of high scores.
Recognisable by: thick “platypus” tail that’s low and flat in the graph.
Leptokurtic distribution
High scores are extremely centralised and obviously close to the mean.
Recognisable by: skyscraper appearance - long and pointy.
Mode
Most common score.
If there are 2 equally common scores it is bimodal and there can be no mode.
Can use mode for nominal data.
Disadvantages of mode
1) Could be bimodal (or multimodal) and give no true mode (e.g. 3/10 and 7/10 are opposites but bimodal).
2) A mode can be changed dramatically if one single case is added.
Median
Central point of scores in ascending data. Middle number in odd number of cases. If even - mean of 2 central numbers.
+ Relatively unaffected by outliers
+ Less affected by skewed distribution
+ Ordinal, interval, and ratio data
- Can’t use on nominal
- Susceptible to sampling fluctuation
- Not mathematically useful
Mean
Add all scores and divide by total number of scores collected.
\+ Good for scores grouped around central \+ Interval and ratio data \+ Uses every score \+ Can be used algebraically \+ Resistant to sampling variation - accurate
- Outliers that are extreme
- Affected by skewed distributions
- Not used for nominal and ordinal
Range
Subtract smallest score from largest.
+ Good for cluster scores
+ Useful measure of variability for nominal & ordinal data
Symbols
x = score x̅ = mean x-x̅ = deviation (d) ∑ = sum N = number in a sample s² = variance of a sample s = standard deviation
Accuracy of the mean
Hypothetical value - doesn’t translate to real values (i.e. 2.5 children)
1) Standard deviation
2) Sum of squares
3) Variance
Total error
Worked out by adding all the deviations.
Deviation
Observed value - mean
Negative score = overestimate for this participant
Positive score = underestimate
Sum of squared errors (SS)
Square all deviations so they become positive.
Add them together to make a sum of squares.
The higher the sum of squares, the more variance in the data.
More variance = less reliable
Standard deviation (σ)
A measure of spread - is it statistically significant or expected variance?
Anything within standard deviation value from mean would be expected variance. Anything outside 2 standard deviations is statistically significant.
Same as SS / x̅. Square root the result.
Sampling distribution
The frequency distribution of sample means from the same population.
Standard error of the mean (SE)
The accuracy with which a sample reflects the population. Measured by deviation from the mean.
Large value = different from the population
Small value = reflective of population
Confidence interval
If we can assess the accuracy of sample means, we can calculate the boundaries within which most sample means will fall. This is the confidence interval.
If it represents the data well, the confidence interval of that mean should be small.
Descriptive statistics
Shows what is happening in a given sample.
Inferential statistics
Allows us to make assumptions based on the information we have analysed.
At what probability value can we accept a hypothesis and reject a null hypothesis?
0.05 or less.
Type 1 error
When we believe our experimental manipulation has been successful when it is actually due to random errors. E.g if we are accepting 5% as significance value and repeated the experiment 100 times, we would still have 5 times we get statistical significance that is due to random error.