Numerical Descriptive Statistics Flashcards

1
Q

4 measures used to describe data

A

Central tendency
Quartiles
Variation
Shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 measures of central tendency

A

Arithmetic mean
Median
Mode
Geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5 measures of variation

A
Range 
Interquartile range 
Variance
Standard deviation 
Coefficient of variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

1 measure of shape

A

Skewness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s required to make an informed decision

A

Central tendency (location), spread and shape need to be known and all 3 must be present for complete information. This allows for you to make an informed decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Arithmetic mean

A

Arithmetic mean is summing up the observations and dividing by the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median and mode extreme values

A

The median is not sensitive to extreme values and the mean is sensitive to extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sigma

A

Sigma is short for adding up the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Median

A

In an ordered array, the median is the middle number (50% above and 50%below). It’s main advantage over the arithmetic mean is that it is not affected by extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Location of the median

A

median = n+1/2 ranked value. This is not the value of the median, only the position of the median in the ranked data. If the number of observations in the data set is odd, the median is the middle ranked value. If the number of values in the data set is even, the median is the mean (average) of the two middle ranked values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Mode

A

A measure of central tendency. Value that occurs most often (the most frequent). Not affected by extreme values. Never use the mode by itself, always use in conjunction with median or mean. Unlike mean and median, there may be no unique (single) mode for a given data set. Used for either numerical or categorical (nominal) data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What measure is best to use

A

As the sample size gets bigger the influence of extreme values deteriorates. The mean is generally used most often, unless extreme values (outliers) exist. The median is often used, since it is not sensitive to extreme values. The mode is usually the least used of the three. Since we have an obvious outlier ($2,000,000), it makes sense to use the median in this instance. Most housing prices are now reported as median housing prices in Australian newspapers due to possible outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Quartiles

A

Quartiles split the ranked data into four segments, with an equal number of values per segment. The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. The second quartile, Q2, is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile, Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Finding the quartile

A

Similar to the median, we find a quartile by determining the value in the appropriate position in the ranked data, where: First quartile position: Q1 = (n+1)/4 Second quartile position: Q2 = (n+1)/2 (the median) Third quartile position: Q3 = 3(n+1)/4 where n is the number of observed values (sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Quartile rule 1

A

If the result is an integer, then the quartile is equal to the ranked value. For example, if the sample size is n = 7, the first quartile, , is equal to the (7+1)/4 = second ranked value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quartile rule 2

A

If the result is a fractional half (1.5, 2.5, 3.5, etc.), then the quartile is equal to the mean of the corresponding ranked values. For example, if the sample size is n = 9, the first quartile, , is equal to the (9+1)/4 = 2.5 ranked value, halfway between the second and the third ranked values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Quartile rule 3

A

If the result is neither an integer nor a fractional half, round the result to the nearest integer and select that ranked value. For example, if the sample size is n = 10, the first quartile, , is equal to the (10+1)/4 = 2.75 ranked value. Round 2.75 to 3 and use the third ranked value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Measures of variation

A

Measures of variation give information on the spread or variability of the data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interquartile range

A

Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations. IQR = 3rd quartile – 1st quartile. IQR = Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sample variance

A

Measures average scatter around the mean. Units are also squared. This measure tells you the average deviation of the mean. The reason we square the values is because some are negative and some are positive. The sample variance is the squared average difference between the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sample standard deviation

A

Most commonly used measure of variation. Shows variation about the mean. Has the same units as the original data. It can be considered a measure of uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Advantages of variance and standard deviation

A

Each value in the data set is used in the calculation. Values far from the mean are given extra weight as deviations from the mean are squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Disadvantages of variation and standard deviation

A

Sensitive to extreme values (outliers). Measures of absolute variation not relative variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Differences between sample and population in regards to standard deviation and variance

A

When calculating variance and standard deviation for a sample n-1 is used and when calculating for a population N is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Coefficient of variation
Measures relative variation i.e. shows variation relative to mean. Can be used to compare two or more sets of data measured in different units. Always expressed as percentage (%)
26
The Z score
The difference between a given observation and the mean, divided by the standard deviation. A Z score of 2.0 means that a value is 2.0 standard deviations from the mean. A Z score above 3.0 or below -3.0 is considered an outlier
27
The shape of a distribution
Describes how data are distributed. Measures of shape are symmetric or skewed
28
Left skewed and right skewed
When the data is left or negatively skewed the distance between the q1 and q2 is greater than the distance between q2 and q3. The reverse applies for right or positively skewed data. If the data is symmetric the distances are the same
29
What does a box and whisker plot show
Box and whisker plot show location, spread and shape.
30
Numerical measures for a population
Population summary measures are called parameters. The population mean is the sum of the values in the population divided by the population size, N
31
Population variance
the average of the squared deviations of values from the mean
32
Population standard deviation
shows variation about the mean. is the square root of the population variance. has the same units as the original data
33
Arithmetic mean equation
Photo 1
34
Example of mean, median and mode
Photo 2
35
Quartile example
Photo 3
36
Measures of variation example
Photo 4
37
Range example
Photo 5
38
Range disadvantages
Photo 6
39
Sample variance equation
Photo 7
40
Sample standard deviation equation
Photo 8
41
Sample standard deviation example
Photo 9
42
Sample standard deviation graphed example
Photo 10
43
Comparing standard deviations
Photo 11
44
Coefficient of variation equation
Photo 12
45
Coefficient of variation example
Photo 13
46
The 3 shapes of a distribution
Photo 14
47
Using excel for descriptive statistics
Photos 15-16
48
Population mean equation
Photo 17
49
Empirical rule
Photos 18-19
50
Box and whisker plot
Photo 20
51
Distribution shape box and whisker plot
Photo 21
52
Covariance
The sample covariance measures the strength of the linear relationship between two numerical variables. Only concerned with the direction of the relationship. No causal effect is implied. Is affected by units of measurement
53
Covariance equation
Photo 22
54
Correlation
Measures the relative strength of the linear relationship between two variables
55
Correlation equation
Photo 23
56
Features of correlation coefficient
Also called Standardised Covariance i.e. invariant to units of measure. Ranges between –1 and 1. The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship. The closer to 0, the weaker the linear relationship
57
Scatter Plots of Data with Various Correlation Coefficients
Photo 24
58
Pitfalls and ethical issues
Data Analysis is objective | Data analysis is subjective
59
Objective
Should report the summary measures that best meet the assumptions about the data set
60
Subjective
Should be done in fair, neutral and transparent manner. Should document both good and bad results. Results should be presented in a fair, objective and neutral manner. Should not use inappropriate summary measures to distort facts. Do not fail to report pertinent findings even if such findings do not support original argument
61
IQR Example
Photo 25
62
Population variance and standard deviation equations
Photo 26
63
5 number summary
Numerical data summarised by quartiles. Xsmallest Q1 Median Q3 Xlargest