Chapter 3 Flashcards

1
Q

What are the measures of location

A
  1. Mean
  2. Weighted Mean
  3. Geometric Mean
  4. Median
  5. Mode
  6. Percentiles
  7. Quartiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the mean

A
  • most important measure of location
  • also called average
  • provides a measure of central location
  • most commonly used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the weighted mean

A
  • arithmetic average where some data values contribute more than others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the Geometric Mean

A
  • finding the nth root of the product of n vales
  • often used to analyze growth rates in financial data
  • in these cases, arithmetic mean will provide misleading results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What format must the the number be in in order to calculate the Geometric mean?

A
  • can’t be in a percent must be an integer
  • if it is a percent, and it is negative, 1- the number
  • if it is a percent and it is positive , 1 + the number
    example: -40% converts to -0.40 = 1-.40 = 0.6
    then you can use the formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the median

A
  • sometimes the more preferred method because it remove outliers
  • the middle of a sorted list of data values
  • arranged in ascending order
  • odd # values = median is the middle number
  • even # values = median is the mean of 2 central data values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the mode

A
  • data value that occurs most often (greatest frequency)
  • 2 modes - bimodal
  • more than 2 modes - multimodal
    • don’t report the mode b/c listing 3 or more modes is not helpful in describing the location of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe percentiles

A
  • how data is spread over the interval from smallest value to largest
    3 steps
    1. arrange the data in ascending order
    2. compute an index
    3. if i is not an integer - round up
    if i is an interger, the pth percentile is the avg. of the values in i & i+1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for to find the percentile of x?

A

of data points than x / total # of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the measures of variability

A
  1. range
  2. interquartile range
  3. variance
  4. standard deviation
  5. coefficient of variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is another name for measures of variability

A

Measures of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the measures of variability

A

spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Range

A
  • Largest value - smallest value
  • seldom used as the only measure
    b/c it is based on 2 observations and it is highly influenced by extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Interquartile range

A

IQR = Q3- Q1

  • overcomes the dependency of extreme values
  • difference b/w the 3rd and 1st quartile
  • it is the range for the middle 50% of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe Variance

A
  • utilizes ALL data
  • based on difference b/w EACH observation (xi) and the mean
  • called deviation about the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe Standard deviation

A
  • positive square root of variance
  • easier to interrupt than the variance b/c SD is measured in the same units as the data
  • commonly used measure of risk associated with investing in stock and stock funds
17
Q

Describe the Coefficient of Variation

A
  • when interested in descriptive statistic that indicates how LARGE the SD is relative to the MEAN
  • usually expressed as a percent
  • tells us that the sample SD x% of the value of the sample mean
  • useful for comparing the variability of variables that have different SD and Different means
18
Q

What is the formula for Coefficient of Variation

A

(SD/Mean) x 100

19
Q

What is MAE

A

Mean absolute error

20
Q

What is MAE

A
  • sum the absolute values of the deviations of the observations about the mean & divide it by # of observations
21
Q

What are the measures of distribution of shape

A
  1. Skewness
  2. z-scores
  3. Chebyshev’s Theorem
  4. Empirical Rule
22
Q

What kind of skewness is there and describe each one

A
  1. Skewed Left
    - skewness is negative
    - mean is usually less than the median
  2. Skewed Right
    - skewness is positive
    - mean is usually more than the median

Symmetrical

  • skewness is zero
  • mean and median are equal
23
Q

describe z-scores

A
  • to find relative location of values w/in a data set
  • how far a particular value is from the mean
  • also called standard value
  • using mean and SD we can find the relative location of any observation
24
Q

Describe Chebyshev’s Theorem

A
  • allows us to make statements about the proportion of data values that must be w/in a specified # of SD of the mean
  1. at least 75% of the data values are w/in 2 SD of the mean
  2. at least 89% of the data values are w/in 3 SD of the mean
  3. at least 94% of the data values are w/in 4 SD of the mean
25
Q

Describe Empirical Rule

A
  • based on normal prob. distribution
  • used for symmetrical bell shaped distribution
  • to determine % of data values that must be w/in a specified # of SD from the mean
  1. approx. 68% of the data values will be w/in 1 SD of the mean
  2. approx. 95% of the data values will be w/in 2 SD fo the mean
  3. Almost all of the data values will be w/in 3 SD of the mean
26
Q

What can you use to detect outliers

A
  1. Z-scores - use empirical rule
    - treat any data values with a score of less than -3 or more than +3 as outlier
  2. based on 1st and 4rd Quartiles and IQR
    a. compute lower and upper limits
    Lower limit = Q1 - 1.5(IQR)
    Upper limit = Q3 + 1.5 (IQR)
    - if the value is less than the lower limit or greater than the upper limit, treat it as an outlier
27
Q

What are the 5 number summaries

A
  • used to summarize the data
    1. Smallest value
    2. FIrst Quartile Q1
    3. Second Quartile Q2 (Median)
    4. Third Quartile Q3
    5. Largest Value
28
Q

What is a box plot useful for

A
  • provides a convenient visual display of several characteristics of a data set
  • based on 5 # summary
  • need to compute IQR = Q3 - Q1
29
Q

What are the advantages of a box plot

A
  • easy to use
  • few calculations
  • no need to calculate mean and SD
30
Q

What are the steps in constructing a box plot

A
  1. a box is drawn with the ends of the box located at 1st and 3rd quartiles
    - this box contains 50% of the data
  2. a vertical line is drawn in the box at the location of the median
  3. using IQR, limits are located at 1.5(IQR) below Q1 and 1.5 IQR, above Q3
    - data outside these limits are outliers
  4. Dash lines are called whiskers
    - drawn from the end of the box to the smallest and largest values in step 3
  5. location of each outlier is shown with *

Note: generally upper and lower limits are not drawn on the box