Chapter 4: Numerical Descriptive Techniques Flashcards

1
Q

Measures of central location

A

Arithmetic mean (mean, average)
Median
Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Arithmetic mean

A

Aka mean or average

Sum of all observations / total number of observations

Essentially same calculation for sample and population

(Average function in Excel)

Usually first selection of central location but can be sensitive to extreme outlier values

Only functional for interval data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Median

A

Observations that falls in the middle of a list of observations places in order

If even number of observations then median determined by averaging two middle observations

Same calculation for sample and population

Median function in Excel

Often a better function of center than mean if there are a small number of extreme outlier observations

50% of observations are above and 50% below

Useful for interval AND ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode

A

The observation (or observations) that occur with the greatest frequency

Sample and population calculated the same way

For larger samples and populations modal class may make more sense than a single mode value

Not great for small samples, potentially not unique

Mode function in Excel
- If multiple Excel returns smallest mode without indicating alternatives

Can be used for any type of data (interval, ordinal, nominal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Using Excel to calculate multiple statistics

A
Data
Data analysis
Descriptive statistics
Select input range
Summary statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measures of variability

A

Range
Variance
Standard deviation
Coefficient of variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Range

A

= largest observation - smallest observation

No information about observations in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance

A

Average deviation from the mean squared

  • calculate mean
  • find the difference (deviation) of each observation from the mean
  • square each deviance and sum them together
  • divide that by 1 less than the number of observations (this corrects for the mean observation)
  • results in variance ^2

Excel: use VAR function

Mostly useful for comparing multiple sets of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Shortcut method for variance

A

S^2 = (1/n-1) x (sum of all observations squared - (sum of all observations/number of observations))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation

A

Average deviation from the mean

Square root of the variance

Measure of consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Empirical rule for interpreting standard deviation

A

If histogram of observations is bell shaped (symmetrical and unimodal) then:

  • approx 68% of all observations fall within one standard deviation of the mean
  • approx 95% of all observations fall within two standard deviations of the mean
  • approx 99.7% of all observations fall within three standard deviations of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chebysheff’s theorem

A

The proportion of observations in any sample or population that lie within k standard deviations of the mean is:

1 - (1/k^2) for k>1

Provides the lower bound of proportions in an interval

Can be used when the empirical rule does not apply (non bell shaped histograms)

Can be used when empir

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Coefficient of variation

A

The standard deviation of the observations divided by the mean

Indicates if standard deviation is large or small given the observation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measures of relative standing

A

Provide information about the position of particular values relative to the entire data set.

Percentile
Quartiles
(Quintiles, deciles)
Interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Percentile

A

The Pth percentile is the value for which P% are less than the value and (100 - P)% are greater than the value

Use to describe a single set of interval or ordinal data to communicate relative standing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quartiles

A

Describe the 25th, 50th, and 75th percentiles

25th percentile- first/ lower quartile, Q1
50th percentile - second quartile, Q2 (median)
75th percentile - third/ upper quartile, Q3

Use to describe a single set of interval or ordinal data to communicate relative standing

Excel: use descriptive statistics box
Define kth largest (integer closest to n/4)
Same for kth smallest
To approximate third and first quartiles

Gives some idea of histogram shape
Skewed vs symmetric

17
Q

Location of a percentile

A

Location of percentile P = (n + 1) * p/100

n= number of observations

Tells you the distance the of the percentile from the surrounding observations

18
Q

Interquartile range

A

= Q3 - Q1

Measures the spread of the middle 50% of observations

Large values = observations far apart = high variability

Use to describe a single set of interval or ordinal data to communicate variability

19
Q

Measures of linear relationship

A

Covariance
Coefficient of correlation
Coefficient of determination

20
Q

Covariance

A

Covariance of variables x and y = sum of all observations (distance of x from mean of x) * (distance of y from mean of y) / n-1

Covariance is positive number = variables move in the same direction

Negative number: variables move in opposite directions

Large number: strong relationship
Small number: less strong relationship
- hard to judge without additional data

21
Q

Coefficient of correlation

A

The covariance divided by the product of the standard deviations of the variables

Sets limits at - and +1 respectively

\+1 = perfect positive relationship
-1 = perfect negative relationship
0 = no linear relationship

Must always judge in relation to other variables

22
Q

Coefficient of determinarion

A

Square of the coefficient of correlation

Determines the amount of variation in the dependant variable that is explained by the variation of the independent variable

1= 100%
0= no relationship

Excel: trendline, more options, display r+ squared value on chart