DESCRIPTIVE STATISTICS Flashcards
Descriptive statistics
Offer researcher ways of describing and summarising quantitative data.
Raw scores may be meaningless & confusing, so we need to present material in understandable and informative way.
-Reader can then see main trends of research.
Measures of central tendency
These describe a data set by identifying one score that represents the general trend of data.
They describe how the data cluster together around a central point.
3 measures of central tendency
Mode
Median
Mean
Mode
The value that occurs most frequently.
advantage of mode
It is simple and not affected by one or two extreme scores (outliers).
Useful when data is in categories.
disadvantage of mode
Can be unreliable as there can be several modes or no mode at all.
Does not particularly represent central tendency.
If one score changed, mode can change.
Relies on a score occurring more than others.
Median
Middle value in a set of data.
In odd number of score, median is the middle.
In an even number of scores, we take 2 central values and find average.
advantage of median
Not distorted by extreme values. Can give representative value
disadvantage of median
It ignores most of the scores, it is less sensitive than the mean.
May not be an actual value in a data set if there is an even number of values.(have to calculate the number)
Mean
‘Arithmetic average’
Calculates measure of central tendency by adding all the values and dividing by the number of values.
advantage of mean
Takes all scores into account, making it a sensitive measure of central tendency.
Misses nothing out, giving a valid measure.
disadvantage of mean
It can be misleading if there are one or two extreme scores in one direction. (Misrepresentative)
Average is often decimal, can be seen as meaningless.
Extreme scores
Can make measures unreliable as they misrepresent the true tendency of a data set by skewing it so it is too high or too low.
Measures of dispersion
These measures tell us whether scores in a set of data are similar to each other or if they are SPREAD OUT.
e.g Range, Standard Deviation
Range
Simplest measure of dispersion.
Difference between lowest and highest numbers.
Advantage of range
Simple to calculate, takes into account extreme values.
Disadvantage of range
Outliers can greatly influence the range value.
Ignore ALL BUT 2 scores, unlikely to provide adequate measure of dispersion.
Some statistics books define range as…
Highest core minus lowest score PLUS ONE.
This is an inclusive measure of range rather than a difference between 2 scores.
repeated measures design and range
Researcher may expect range of scores to be very similar across conditions as same ps are used.
Variance (S²)
Variance tells us more than the range.
Rather than looking at only 2 extremes of the data set, variance considers the DIFFERENCE between each data point and the mean (deviation).
These deviations are then squared, added together and total is divided by the number of scores in the data set minus 1.
Formula for variance
n - 1
(always start with brackets!)
symbol meanings
s² = variance
x = term in data set
x̄ = sample mean
Σ = sum of
n= sample size
Step by step to calculate variance=
- Calculate the mean (x̄)
- Write number of scores(n)
- Draw table with 3 columns and write scores (value of x) down first column.
- Work out the difference between each score and the mean (ignore if it is a positive or negative number) x - x̄
- Square each of these differences. (x - x̄)²
- Add together the column of differences Σ ( x - x̄ ) ²
- Take the sum and divide it by n-1 (number of scores - 1).
Standard Deviation (SD)
( σ )
A measure of how spread out numbers are.
Variance is a squared number, so is not in same units as the mean.
Standard deviation uses square root, returning figure back to same units as the mean.
(Easier to make direct judgements about data set)
Standard deviation as a measure of dispersion
Like the mean, it takes ALL VALUES in a data set into account when calculated.
We must assume we have a normal distribution before it is reasonable to use it (like the mean).
Normally distributed data
Around 2/3 (68.26%) of scores should lie within 1 SD of the mean.
When is SD used?
Method for calculating standard deviation is used when we want to ESTIMATE the standard deviation of the POPULATION, not just the sample taken.
Large standard deviation
The scores are widely spread.
Small standard deviation
Closer grouping of scores around the mean.
Percentage of values which lie between the mean and a given number of standard deviations are fixed. They are:
-68.26% of all values lie within 1SD either side of the mean.
-95.44% of all values lie within 2SD either side of the mean.
-99.74% of all values lie within 3SD either side of the mean.
symbol for standard deviation
σ
Formula for standard deviation
Σ ( x - x̄ ) ²
√ ——–
n - 1
Advantages of standard deviation
-More precise measure than variance, takes all values into account, so, unlike the range it is not distorted by outliers.
-Shows how much data is clustered around a mean value.
-Gives more accurate idea of how data is distributed
Disadvantages of standard deviation
-May hide extreme values of data sets
-Doesn’t give you full range of data
-Can be hard to calculate
The normal distribution
In real life situations variables such as height, weight, shoe size, exam results, IQ scores etc… All show normal distribution when plotted on graphs.
-This info can help make assumptions in way populations are distributed.
Some of most powerful statistical tests researchers and students use assume samples obtained from populations are normally distributed.
Characteristics of normal distributions
-Symmetry at the mean value
-Curve end points/ “tails” meet the X axis.
-Shape of the curved should be bell shaped.
In normal distributed set of scores…
68.2% of the scores lie +1 or -1 standard deviation above/ below the mean.
Skewed distribution
When the curve it is not symmetrical at the mean(or median or mode) point.
A skew can be positive or negative.
A positive skew occurs when
Most scores fall below the mean.
(Tail goes to the right of the x axis)
Most scores are bunched to the left, mode is to the left of the mean because the mean is affected by extreme scores tailing off to the right.
A negative skew occurs when
Most scores fall above the mean.
(Tail goes to the left of the x axis)
Most scores are bunched to the right, the mode is to the right.
Mode on distribution graphs
Always the highest point (occurs the most)