Describing data Flashcards
1
Q
Central tendency
A
- the middle of the collected data
- mean, median and mode are all measures of central tendency
2
Q
Mean
A
- sum of scores divided by number of scores
- influenced by all available scores
- easily influenced by outliers
- the more samples, the closer the mean comes to the true population mean
3
Q
Geometric mean
A
- if individual observations are log transformed, then averaged and then back-transformed using antilog then the geometric mean is found
- this closer to the medican and has symmetrical distribution
4
Q
Weighted mean
A
- used when some observations are more or less valuable than others when reaching a summary measure
- individual values are multiplied by weights (constants) attached to them before averaging
5
Q
Median
A
- the point value that divides a distribution into two equal sized groups
- half score fall below, half above
- aka 50th percentile
- not as influenced by extreme scores as mean but it ignores most of the available information
- it is preferable for nominal data when treated as values (not as counts)
6
Q
Mode
A
- the most commonly occurring value in a distribution
- crude measure, mostly used for nominal data (frequencies)
- also useful for ordinal data to understand the most common rating obtained on a likert scale
- similar to medial but ignores most of the available information
- in bimodal distribution two values occur equally frequently
7
Q
Skew
A
- in normal symmetric distribution, mean, median and mode are equal
- positive skew- higher extreme outliers are present, making mean higher than median
- negative skew, lower value outliers lead to mean being less than median and left tail being longer than right
8
Q
Range
A
- difference between the highest and lowest scores in a distribution
- easily determined when the data is arranged in a rank order (ascending or descending)
- very distorted by extreme scores
9
Q
Interquartile range
A
-refers to the difference between 75th and 25th percentile values
10
Q
Variance
A
=sum of squared differences of individual observations from mean/(number of observations-1)
- N-1 is degrees of freedom
- variance is high when scores are widely scattered
- low variance when scores cluster around mean
- expressed as squared units of the original measure
11
Q
Standard deviation
A
- square root of variance
- measures dispersion
- estimates the variability of the sample and tells us the distribution of individual data points around the mean
12
Q
Coefficient variation
A
- obtained by dividing the standard deviation by the mean and expressing this as a percentage
- measure of relative spread of the data
13
Q
Standard error of the mean
A
- standard deviation divided by square root of sample size
- larger sample provides less SE
- describes precision and uncertainty of how the sample represents the underlying population
- SE is always smaller than SD
- shows us how precise our estimate of the mean is
14
Q
Box and whisker plot
A
- whiskers denote the range
- black horixontal line is the median
- rectangle is the end of 1st quartile to beginning of the 4th quartile
15
Q
Stem and leaf plot
A
- first few digits of numerical obervations are plotted along a vertical axis and then single numbers are added to represent individual values
e. g
1: 1 2 3 4 5
2: 2254
3: 663999