- most important measure of location - also called average - provides a measure of central location - most commonly used

- sometimes the more preferred method because it remove outliers - the middle of a sorted list of data values - arranged in ascending order - odd values = median is the middle number - even values = median is the mean of 2 central data values

- data value that occurs most often (greatest frequency) - 2 modes - bimodal - more than 2 modes - multimodal - don't report the mode b/c listing 3 or more modes is not helpful in describing the location of data

- how data is spread over the interval from smallest value to largest 3 steps 1. arrange the data in ascending order 2. compute an index 3. if i is not an integer - round up if i is an interger, the pth percentile is the avg. of the values in i & i+1

- Largest value - smallest value - seldom used as the only measure b/c it is based on 2 observations and it is highly influenced by extreme values

- utilizes ALL data - based on difference b/w EACH observation (xi) and the mean - called deviation about the mean

Chapter 3 Flashcards by Mitch Mitchell

What are the measures of location

Mean
Weighted Mean
Geometric Mean
Median
Mode
Percentiles
Quartiles

How well did you know this?

Not at all

Perfectly

Describe the mean

most important measure of location
also called average
provides a measure of central location
most commonly used

How well did you know this?

Not at all

Perfectly

Describe the weighted mean

arithmetic average where some data values contribute more than others

How well did you know this?

Not at all

Perfectly

Describe the Geometric Mean

finding the nth root of the product of n vales
often used to analyze growth rates in financial data
in these cases, arithmetic mean will provide misleading results

How well did you know this?

Not at all

Perfectly

What format must the the number be in in order to calculate the Geometric mean?

can’t be in a percent must be an integer
if it is a percent, and it is negative, 1- the number
if it is a percent and it is positive , 1 + the number
example: -40% converts to -0.40 = 1-.40 = 0.6
then you can use the formula

How well did you know this?

Not at all

Perfectly

Describe the median

sometimes the more preferred method because it remove outliers
the middle of a sorted list of data values
arranged in ascending order
odd # values = median is the middle number
even # values = median is the mean of 2 central data values

How well did you know this?

Not at all

Perfectly

Describe the mode

data value that occurs most often (greatest frequency)
2 modes - bimodal
more than 2 modes - multimodal
- don’t report the mode b/c listing 3 or more modes is not helpful in describing the location of data

How well did you know this?

Not at all

Perfectly

Describe percentiles

how data is spread over the interval from smallest value to largest
3 steps
1. arrange the data in ascending order
2. compute an index
3. if i is not an integer - round up
if i is an interger, the pth percentile is the avg. of the values in i & i+1

How well did you know this?

Not at all

Perfectly

What is the formula for to find the percentile of x?

of data points than x / total # of data points

How well did you know this?

Not at all

Perfectly

What are the measures of variability

range
interquartile range
variance
standard deviation
coefficient of variation

How well did you know this?

Not at all

Perfectly

What is another name for measures of variability

Measures of dispersion

How well did you know this?

Not at all

Perfectly

What is the measures of variability

spread of data

How well did you know this?

Not at all

Perfectly

Describe Range

Largest value - smallest value
seldom used as the only measure
b/c it is based on 2 observations and it is highly influenced by extreme values

How well did you know this?

Not at all

Perfectly

Describe Interquartile range

IQR = Q3- Q1

overcomes the dependency of extreme values
difference b/w the 3rd and 1st quartile
it is the range for the middle 50% of the data

How well did you know this?

Not at all

Perfectly

Describe Variance

utilizes ALL data
based on difference b/w EACH observation (xi) and the mean
called deviation about the mean

How well did you know this?

Not at all

Perfectly

Describe Standard deviation

Study These Flashcards

positive square root of variance
easier to interrupt than the variance b/c SD is measured in the same units as the data
commonly used measure of risk associated with investing in stock and stock funds

Describe the Coefficient of Variation

Study These Flashcards

when interested in descriptive statistic that indicates how LARGE the SD is relative to the MEAN
usually expressed as a percent
tells us that the sample SD x% of the value of the sample mean
useful for comparing the variability of variables that have different SD and Different means

What is the formula for Coefficient of Variation

Study These Flashcards

(SD/Mean) x 100

What is MAE

Study These Flashcards

Mean absolute error

What is MAE

Study These Flashcards

sum the absolute values of the deviations of the observations about the mean & divide it by # of observations

What are the measures of distribution of shape

Study These Flashcards

Skewness
z-scores
Chebyshev’s Theorem
Empirical Rule

What kind of skewness is there and describe each one

Study These Flashcards

Skewed Left
- skewness is negative
- mean is usually less than the median
Skewed Right
- skewness is positive
- mean is usually more than the median

Symmetrical

skewness is zero
mean and median are equal

describe z-scores

Study These Flashcards

to find relative location of values w/in a data set
how far a particular value is from the mean
also called standard value
using mean and SD we can find the relative location of any observation

Describe Chebyshev’s Theorem

Study These Flashcards

allows us to make statements about the proportion of data values that must be w/in a specified # of SD of the mean

at least 75% of the data values are w/in 2 SD of the mean
at least 89% of the data values are w/in 3 SD of the mean
at least 94% of the data values are w/in 4 SD of the mean

Describe Empirical Rule

- based on normal prob. distribution - used for symmetrical bell shaped distribution - to determine % of data values that must be w/in a specified # of SD from the mean 1. approx. 68% of the data values will be w/in 1 SD of the mean 2. approx. 95% of the data values will be w/in 2 SD fo the mean 3. Almost all of the data values will be w/in 3 SD of the mean

What can you use to detect outliers

1. Z-scores - use empirical rule - treat any data values with a score of less than -3 or more than +3 as outlier 2. based on 1st and 4rd Quartiles and IQR a. compute lower and upper limits Lower limit = Q1 - 1.5(IQR) Upper limit = Q3 + 1.5 (IQR) - if the value is less than the lower limit or greater than the upper limit, treat it as an outlier

What are the 5 number summaries

- used to summarize the data 1. Smallest value 2. FIrst Quartile Q1 3. Second Quartile Q2 (Median) 4. Third Quartile Q3 5. Largest Value

What is a box plot useful for

- provides a convenient visual display of several characteristics of a data set - based on 5 # summary - need to compute IQR = Q3 - Q1

What are the advantages of a box plot

- easy to use - few calculations - no need to calculate mean and SD

What are the steps in constructing a box plot

1. a box is drawn with the ends of the box located at 1st and 3rd quartiles - this box contains 50% of the data 2. a vertical line is drawn in the box at the location of the median 3. using IQR, limits are located at 1.5(IQR) below Q1 and 1.5 IQR, above Q3 - data outside these limits are outliers 4. Dash lines are called whiskers - drawn from the end of the box to the smallest and largest values in step 3 5. location of each outlier is shown with * Note: generally upper and lower limits are not drawn on the box

Chapter 3 Flashcards

(30 cards)