Describing data Flashcards

1
Q

Central tendency

A
  • the middle of the collected data

- mean, median and mode are all measures of central tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mean

A
  • sum of scores divided by number of scores
  • influenced by all available scores
  • easily influenced by outliers
  • the more samples, the closer the mean comes to the true population mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Geometric mean

A
  • if individual observations are log transformed, then averaged and then back-transformed using antilog then the geometric mean is found
  • this closer to the medican and has symmetrical distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Weighted mean

A
  • used when some observations are more or less valuable than others when reaching a summary measure
  • individual values are multiplied by weights (constants) attached to them before averaging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median

A
  • the point value that divides a distribution into two equal sized groups
  • half score fall below, half above
  • aka 50th percentile
  • not as influenced by extreme scores as mean but it ignores most of the available information
  • it is preferable for nominal data when treated as values (not as counts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mode

A
  • the most commonly occurring value in a distribution
  • crude measure, mostly used for nominal data (frequencies)
  • also useful for ordinal data to understand the most common rating obtained on a likert scale
  • similar to medial but ignores most of the available information
  • in bimodal distribution two values occur equally frequently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skew

A
  • in normal symmetric distribution, mean, median and mode are equal
  • positive skew- higher extreme outliers are present, making mean higher than median
  • negative skew, lower value outliers lead to mean being less than median and left tail being longer than right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Range

A
  • difference between the highest and lowest scores in a distribution
  • easily determined when the data is arranged in a rank order (ascending or descending)
  • very distorted by extreme scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interquartile range

A

-refers to the difference between 75th and 25th percentile values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variance

A

=sum of squared differences of individual observations from mean/(number of observations-1)

  • N-1 is degrees of freedom
  • variance is high when scores are widely scattered
  • low variance when scores cluster around mean
  • expressed as squared units of the original measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard deviation

A
  • square root of variance
  • measures dispersion
  • estimates the variability of the sample and tells us the distribution of individual data points around the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient variation

A
  • obtained by dividing the standard deviation by the mean and expressing this as a percentage
  • measure of relative spread of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard error of the mean

A
  • standard deviation divided by square root of sample size
  • larger sample provides less SE
  • describes precision and uncertainty of how the sample represents the underlying population
  • SE is always smaller than SD
  • shows us how precise our estimate of the mean is
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Box and whisker plot

A
  • whiskers denote the range
  • black horixontal line is the median
  • rectangle is the end of 1st quartile to beginning of the 4th quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Stem and leaf plot

A
  • first few digits of numerical obervations are plotted along a vertical axis and then single numbers are added to represent individual values
    e. g
    1: 1 2 3 4 5
    2: 2254
    3: 663999
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Normal distribution

A
  • 68% of data will lie within 1 SD
  • 95% will lie within 2 SD
  • 99% will lie within 3 SD
  • kurtosis (flatness of the curve=0
  • tail of curve reaches close to the X axis but never touches it
  • SD’s are in omicrons
17
Q

Standard normal distribution

A

-normal distibution whose mean is 0 and SD is 1 unit

18
Q

Standard normal deviate

A

-expression denoted by z

z= (random value x-mean)/ SD

19
Q

High mean

A

-mean shifts the curve to the right

20
Q

Low mean

A

-shifts curve to left

21
Q

Higher SD

A

-decreases the peakedness of the curve

22
Q

Lesser SD

A

-increases the peakedness of the curve

23
Q

Leptokurtic curve

A
  • sharp peak
  • high kurtosis means a high peak is near the mean
  • low curtosis tend to have a flat-top near the mean rather than a sharp peak
24
Q

How to calculate SD

A
  • First work out variance
  • subtract mean from each individual score
  • square them
  • add the together
  • then divide by N(number of samples)-1
  • then square route the variance to get the SD