Numerical Descriptive Statistics Flashcards by Michael Conti

4 measures used to describe data

Central tendency
Quartiles
Variation
Shape

How well did you know this?

Not at all

Perfectly

4 measures of central tendency

Arithmetic mean
Median
Mode
Geometric mean

How well did you know this?

Not at all

Perfectly

5 measures of variation

Range 
Interquartile range 
Variance
Standard deviation 
Coefficient of variation

How well did you know this?

Not at all

Perfectly

1 measure of shape

Skewness

How well did you know this?

Not at all

Perfectly

What’s required to make an informed decision

Central tendency (location), spread and shape need to be known and all 3 must be present for complete information. This allows for you to make an informed decision.

How well did you know this?

Not at all

Perfectly

Arithmetic mean

Arithmetic mean is summing up the observations and dividing by the number of observations.

How well did you know this?

Not at all

Perfectly

Median and mode extreme values

The median is not sensitive to extreme values and the mean is sensitive to extreme values.

How well did you know this?

Not at all

Perfectly

Sigma

Sigma is short for adding up the values

How well did you know this?

Not at all

Perfectly

Median

In an ordered array, the median is the middle number (50% above and 50%below). It’s main advantage over the arithmetic mean is that it is not affected by extreme values.

How well did you know this?

Not at all

Perfectly

Location of the median

median = n+1/2 ranked value. This is not the value of the median, only the position of the median in the ranked data. If the number of observations in the data set is odd, the median is the middle ranked value. If the number of values in the data set is even, the median is the mean (average) of the two middle ranked values.

How well did you know this?

Not at all

Perfectly

Mode

A measure of central tendency. Value that occurs most often (the most frequent). Not affected by extreme values. Never use the mode by itself, always use in conjunction with median or mean. Unlike mean and median, there may be no unique (single) mode for a given data set. Used for either numerical or categorical (nominal) data.

How well did you know this?

Not at all

Perfectly

What measure is best to use

As the sample size gets bigger the influence of extreme values deteriorates. The mean is generally used most often, unless extreme values (outliers) exist. The median is often used, since it is not sensitive to extreme values. The mode is usually the least used of the three. Since we have an obvious outlier ($2,000,000), it makes sense to use the median in this instance. Most housing prices are now reported as median housing prices in Australian newspapers due to possible outliers.

How well did you know this?

Not at all

Perfectly

Quartiles

Quartiles split the ranked data into four segments, with an equal number of values per segment. The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. The second quartile, Q2, is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile, Q3

How well did you know this?

Not at all

Perfectly

Finding the quartile

Similar to the median, we find a quartile by determining the value in the appropriate position in the ranked data, where: First quartile position: Q1 = (n+1)/4 Second quartile position: Q2 = (n+1)/2 (the median) Third quartile position: Q3 = 3(n+1)/4 where n is the number of observed values (sample size)

How well did you know this?

Not at all

Perfectly

Quartile rule 1

If the result is an integer, then the quartile is equal to the ranked value. For example, if the sample size is n = 7, the first quartile, , is equal to the (7+1)/4 = second ranked value

How well did you know this?

Not at all

Perfectly

Quartile rule 2

If the result is a fractional half (1.5, 2.5, 3.5, etc.), then the quartile is equal to the mean of the corresponding ranked values. For example, if the sample size is n = 9, the first quartile, , is equal to the (9+1)/4 = 2.5 ranked value, halfway between the second and the third ranked values.

How well did you know this?

Not at all

Perfectly

Quartile rule 3

If the result is neither an integer nor a fractional half, round the result to the nearest integer and select that ranked value. For example, if the sample size is n = 10, the first quartile, , is equal to the (10+1)/4 = 2.75 ranked value. Round 2.75 to 3 and use the third ranked value.

How well did you know this?

Not at all

Perfectly

Measures of variation

Measures of variation give information on the spread or variability of the data values

How well did you know this?

Not at all

Perfectly

Interquartile range

Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations. IQR = 3rd quartile – 1st quartile. IQR = Q3 - Q1

How well did you know this?

Not at all

Perfectly

Sample variance

Measures average scatter around the mean. Units are also squared. This measure tells you the average deviation of the mean. The reason we square the values is because some are negative and some are positive. The sample variance is the squared average difference between the mean.

How well did you know this?

Not at all

Perfectly

Sample standard deviation

Most commonly used measure of variation. Shows variation about the mean. Has the same units as the original data. It can be considered a measure of uncertainty.

How well did you know this?

Not at all

Perfectly

Advantages of variance and standard deviation

Each value in the data set is used in the calculation. Values far from the mean are given extra weight as deviations from the mean are squared.

How well did you know this?

Not at all

Perfectly

Disadvantages of variation and standard deviation

Sensitive to extreme values (outliers). Measures of absolute variation not relative variation.

How well did you know this?

Not at all

Perfectly

Differences between sample and population in regards to standard deviation and variance

When calculating variance and standard deviation for a sample n-1 is used and when calculating for a population N is used

How well did you know this?

Not at all

Perfectly

Coefficient of variation

Measures relative variation i.e. shows variation relative to mean. Can be used to compare two or more sets of data measured in different units. Always expressed as percentage (%)

The Z score

The difference between a given observation and the mean, divided by the standard deviation. A Z score of 2.0 means that a value is 2.0 standard deviations from the mean. A Z score above 3.0 or below -3.0 is considered an outlier

The shape of a distribution

Describes how data are distributed. Measures of shape are symmetric or skewed

Left skewed and right skewed

When the data is left or negatively skewed the distance between the q1 and q2 is greater than the distance between q2 and q3. The reverse applies for right or positively skewed data. If the data is symmetric the distances are the same

What does a box and whisker plot show

Box and whisker plot show location, spread and shape.

Numerical measures for a population

Population summary measures are called parameters. The population mean is the sum of the values in the population divided by the population size, N

Population variance

the average of the squared deviations of values from the mean

Population standard deviation

shows variation about the mean. is the square root of the population variance. has the same units as the original data

Arithmetic mean equation

Photo 1

Example of mean, median and mode

Photo 2

Quartile example

Photo 3

Measures of variation example

Photo 4

Range example

Photo 5

Range disadvantages

Photo 6

Sample variance equation

Photo 7

Sample standard deviation equation

Photo 8

Sample standard deviation example

Photo 9

Sample standard deviation graphed example

Photo 10

Comparing standard deviations

Photo 11

Coefficient of variation equation

Photo 12

Coefficient of variation example

Photo 13

The 3 shapes of a distribution

Photo 14

Using excel for descriptive statistics

Photos 15-16

Population mean equation

Photo 17

Empirical rule

Photos 18-19

Box and whisker plot

Photo 20

Distribution shape box and whisker plot

Photo 21

Covariance

The sample covariance measures the strength of the linear relationship between two numerical variables. Only concerned with the direction of the relationship. No causal effect is implied. Is affected by units of measurement

Covariance equation

Photo 22

Correlation

Measures the relative strength of the linear relationship between two variables

Correlation equation

Photo 23

Features of correlation coefficient

Also called Standardised Covariance i.e. invariant to units of measure. Ranges between –1 and 1. The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship. The closer to 0, the weaker the linear relationship

Scatter Plots of Data with Various Correlation Coefficients

Photo 24

Pitfalls and ethical issues

Data Analysis is objective | Data analysis is subjective

Objective

Should report the summary measures that best meet the assumptions about the data set

Subjective

Should be done in fair, neutral and transparent manner. Should document both good and bad results. Results should be presented in a fair, objective and neutral manner. Should not use inappropriate summary measures to distort facts. Do not fail to report pertinent findings even if such findings do not support original argument

IQR Example

Photo 25

Population variance and standard deviation equations

Photo 26

5 number summary

Numerical data summarised by quartiles. Xsmallest Q1 Median Q3 Xlargest

Numerical Descriptive Statistics Flashcards

(63 cards)