Describing Data Flashcards

1
Q

sample mean (3)

A
  • sum of all observations in a sample divided by n, the number of observations
  • varies, depending on the composition of the sample
  • symbol: Y bar
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

standard deviation (4)

A
  • common measure of the spread of a distribution and indicates how far the different measurements typically are from the mean
  • 67% of the data lies within Y bar +/- s and 95% of data lies within Y bar +/- 2s
  • the square root of the variance
  • symbol: s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

variance (2)

A
  • a measure of spread of data from the mean
  • symbol (sample): s^2
  • symbol (population): σ (sigma)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

deviation (2)

A
  • the difference between a measurement and the mean

- formula: Yi - Y bar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sum of squares

A
  • summation in the numerator of the variance formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the general rule for rounding answers?

A
  • round descriptive statistics to one decimal place more than the measurements themselves
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coefficient of variation (4)

A
  • standard deviation expressed as a percentage of the mean
  • CV = (s/Y bar) x 100%
  • higher CV means more variability and lower CV means individuals are more similar relative to the mean
  • only applicable if all measurements are > or = to 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do you calculate mean from a frequency table? (3)

A
  1. calculate the sample size by adding all the frequencies
  2. multiply each value by their frequency before adding them together
  3. divide #2 by #1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you calculate standard deviation from a frequency table?

A
  • when subtracting the mean from each value, multiply it by its frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

median (2)

A
  • middle measurement of a set of observations

- median = Y([n+1]/2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interquartile range (2)

A
  • difference between the third and first quartiles of the data and spans the middle 50% of the data
  • IQR = third quartile - first quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

first quartile

A
  • middle value of the measurements lying below the median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

second quartile

A
  • median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

third quartile

A
  • middle value of the measurements larger than the median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

box plot (4)

A
  • displays median and interquartile range
  • lower and upper edges of the box are the first and third quartiles, thus the IQR is visualized by the span of the box
  • lines extend vertically from the box to represent the “non-extreme” values in the data
  • “extreme” values are represented as isolated dots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

“extreme” values

A
  • values lying farther from the box edge than 1.5x the IQR
17
Q

mean vs median (2)

A
  • median is the middle value of the data, less sensitive to extreme observations and is a more informative descriptor of the typical observation in those cases
  • mean is the centre of gravity or balance, greatly affected by skewed distributions but better mathematical properties and is the more reliable measure to estimate
18
Q

SD vs IQR (2)

A
  • SD reflects the variation among all of the data points, but is very sensitive to extreme observations
  • IQR is a better indicator of spread of the main part of the distribution when data is strongly skewed to one side or there are extreme observations
19
Q

percentile of a measurement

A
  • specifies the % of observations less than or equal to it, while other observations exceed it
20
Q

quantile of a measurement

A
  • specifies the fraction of observations less than or equal to it
21
Q

cumulative relative frequency

A
  • at a given measurement, is the fraction of observations less than or equal to that measurement
22
Q

what are the 2 common descriptions of data

A
  1. location (or central tendency)

2. width (spread)

23
Q

what are the measures of location? (3)

A
  • mean
  • mode
  • median
24
Q

population mean

A
  • symbol: μ

- fixed value that we try to predict using the sample mean

25
Q

what are the measures of width? (4)

A
  • range
  • variance
  • standard deviation
  • coefficient of variation
26
Q

range (3)

A
  • maximum minus the minimum
  • poor measure of distribution width as small samples tend to give lower estimates of the range
  • biased estimator of the the true range of the population (tends to be lower)
27
Q

skew

A
  • measurement of asymmetry and refers to the pointy tail of skewed distributions
28
Q

adding/multiplying mean by constant, c

A

adding: Y + c
multiplying: Y * c

29
Q

adding/multiplying SD by constant, c

A

adding: s
multiplying: |c| * s

30
Q

adding/multiplying variance by constant, c

A

adding: s^2
multiplying: s^2 * c^2

31
Q

adding/multiplying median by constant, c

A

adding: M + c
multiplying: M * c

32
Q

adding/multiplying IQR by constant, c

A

adding: IQR
multiplying: |c| * IQR