Numerical Descriptive Statistics Flashcards

1
Q

4 measures used to describe data

A

Central tendency
Quartiles
Variation
Shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 measures of central tendency

A

Arithmetic mean
Median
Mode
Geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5 measures of variation

A
Range 
Interquartile range 
Variance
Standard deviation 
Coefficient of variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

1 measure of shape

A

Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s required to make an informed decision

A

Central tendency (location), spread and shape need to be known and all 3 must be present for complete information. This allows for you to make an informed decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Arithmetic mean

A

Arithmetic mean is summing up the observations and dividing by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median and mode extreme values

A

The median is not sensitive to extreme values and the mean is sensitive to extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sigma

A

Sigma is short for adding up the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Median

A

In an ordered array, the median is the middle number (50% above and 50%below). It’s main advantage over the arithmetic mean is that it is not affected by extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Location of the median

A

median = n+1/2 ranked value. This is not the value of the median, only the position of the median in the ranked data. If the number of observations in the data set is odd, the median is the middle ranked value. If the number of values in the data set is even, the median is the mean (average) of the two middle ranked values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mode

A

A measure of central tendency. Value that occurs most often (the most frequent). Not affected by extreme values. Never use the mode by itself, always use in conjunction with median or mean. Unlike mean and median, there may be no unique (single) mode for a given data set. Used for either numerical or categorical (nominal) data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What measure is best to use

A

As the sample size gets bigger the influence of extreme values deteriorates. The mean is generally used most often, unless extreme values (outliers) exist. The median is often used, since it is not sensitive to extreme values. The mode is usually the least used of the three. Since we have an obvious outlier ($2,000,000), it makes sense to use the median in this instance. Most housing prices are now reported as median housing prices in Australian newspapers due to possible outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quartiles

A

Quartiles split the ranked data into four segments, with an equal number of values per segment. The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. The second quartile, Q2, is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile, Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Finding the quartile

A

Similar to the median, we find a quartile by determining the value in the appropriate position in the ranked data, where: First quartile position: Q1 = (n+1)/4 Second quartile position: Q2 = (n+1)/2 (the median) Third quartile position: Q3 = 3(n+1)/4 where n is the number of observed values (sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Quartile rule 1

A

If the result is an integer, then the quartile is equal to the ranked value. For example, if the sample size is n = 7, the first quartile, , is equal to the (7+1)/4 = second ranked value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quartile rule 2

A

If the result is a fractional half (1.5, 2.5, 3.5, etc.), then the quartile is equal to the mean of the corresponding ranked values. For example, if the sample size is n = 9, the first quartile, , is equal to the (9+1)/4 = 2.5 ranked value, halfway between the second and the third ranked values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Quartile rule 3

A

If the result is neither an integer nor a fractional half, round the result to the nearest integer and select that ranked value. For example, if the sample size is n = 10, the first quartile, , is equal to the (10+1)/4 = 2.75 ranked value. Round 2.75 to 3 and use the third ranked value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Measures of variation

A

Measures of variation give information on the spread or variability of the data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interquartile range

A

Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations. IQR = 3rd quartile – 1st quartile. IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sample variance

A

Measures average scatter around the mean. Units are also squared. This measure tells you the average deviation of the mean. The reason we square the values is because some are negative and some are positive. The sample variance is the squared average difference between the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sample standard deviation

A

Most commonly used measure of variation. Shows variation about the mean. Has the same units as the original data. It can be considered a measure of uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Advantages of variance and standard deviation

A

Each value in the data set is used in the calculation. Values far from the mean are given extra weight as deviations from the mean are squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Disadvantages of variation and standard deviation

A

Sensitive to extreme values (outliers). Measures of absolute variation not relative variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Differences between sample and population in regards to standard deviation and variance

A

When calculating variance and standard deviation for a sample n-1 is used and when calculating for a population N is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Coefficient of variation

A

Measures relative variation i.e. shows variation relative to mean. Can be used to compare two or more sets of data measured in different units. Always expressed as percentage (%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The Z score

A

The difference between a given observation and the mean, divided by the standard deviation. A Z score of 2.0 means that a value is 2.0 standard deviations from the mean. A Z score above 3.0 or below -3.0 is considered an outlier

27
Q

The shape of a distribution

A

Describes how data are distributed. Measures of shape are symmetric or skewed

28
Q

Left skewed and right skewed

A

When the data is left or negatively skewed the distance between the q1 and q2 is greater than the distance between q2 and q3. The reverse applies for right or positively skewed data. If the data is symmetric the distances are the same

29
Q

What does a box and whisker plot show

A

Box and whisker plot show location, spread and shape.

30
Q

Numerical measures for a population

A

Population summary measures are called parameters. The population mean is the sum of the values in the population divided by the population size, N

31
Q

Population variance

A

the average of the squared deviations of values from the mean

32
Q

Population standard deviation

A

shows variation about the mean. is the square root of the population variance. has the same units as the original data

33
Q

Arithmetic mean equation

A

Photo 1

34
Q

Example of mean, median and mode

A

Photo 2

35
Q

Quartile example

A

Photo 3

36
Q

Measures of variation example

A

Photo 4

37
Q

Range example

A

Photo 5

38
Q

Range disadvantages

A

Photo 6

39
Q

Sample variance equation

A

Photo 7

40
Q

Sample standard deviation equation

A

Photo 8

41
Q

Sample standard deviation example

A

Photo 9

42
Q

Sample standard deviation graphed example

A

Photo 10

43
Q

Comparing standard deviations

A

Photo 11

44
Q

Coefficient of variation equation

A

Photo 12

45
Q

Coefficient of variation example

A

Photo 13

46
Q

The 3 shapes of a distribution

A

Photo 14

47
Q

Using excel for descriptive statistics

A

Photos 15-16

48
Q

Population mean equation

A

Photo 17

49
Q

Empirical rule

A

Photos 18-19

50
Q

Box and whisker plot

A

Photo 20

51
Q

Distribution shape box and whisker plot

A

Photo 21

52
Q

Covariance

A

The sample covariance measures the strength of the linear relationship between two numerical variables. Only concerned with the direction of the relationship. No causal effect is implied. Is affected by units of measurement

53
Q

Covariance equation

A

Photo 22

54
Q

Correlation

A

Measures the relative strength of the linear relationship between two variables

55
Q

Correlation equation

A

Photo 23

56
Q

Features of correlation coefficient

A

Also called Standardised Covariance i.e. invariant to units of measure. Ranges between –1 and 1. The closer to –1, the stronger the negative linear relationship
The closer to 1, the stronger the positive linear relationship. The closer to 0, the weaker the linear relationship

57
Q

Scatter Plots of Data with Various Correlation Coefficients

A

Photo 24

58
Q

Pitfalls and ethical issues

A

Data Analysis is objective

Data analysis is subjective

59
Q

Objective

A

Should report the summary measures that best meet the assumptions about the data set

60
Q

Subjective

A

Should be done in fair, neutral and transparent manner. Should document both good and bad results. Results should be presented in a fair, objective and neutral manner. Should not use inappropriate summary measures to distort facts. Do not fail to report pertinent findings even if such findings do not support original argument

61
Q

IQR Example

A

Photo 25

62
Q

Population variance and standard deviation equations

A

Photo 26

63
Q

5 number summary

A

Numerical data summarised by quartiles. Xsmallest Q1 Median Q3 Xlargest