QU1 chapter 3 notes Flashcards
What are the measures of central tendency
- Arithmetic mean
- median
- mode
- geometric mean
describe arithmetic mean
- most commonly used measure of central tendency
- affected by extreme values
- do not use when data has extreme values
how do you calculate arithmetic mean
sum of all numerical values then divide them by total number of observations
describe median and calculate
- the middle value in an ordered array of data
- not affected by extreme values (outliers)
- if N is odd, then median is the middle number
- if N is even, then median is the average of the two middle numbers
describe mode
the value in a set of data that appears most frequently
- not affected by extreme value
- used for descriptive purposes only (because it is more variable from sample to sample than other measures of central tendency)
Describe geometric mean
help measure the status of an investment over time
- useful measure of the rate of change or a variable over time
how do you calculate the geometric mean
multiply all the numbers together then to the exponent of 1/number of variables
What is a quartiles
- most widely used measure of noncentral location
- used to describe properties of large sets of numerical data
- whereas the median is the value that splits the ordered array in half (50% of the observations are smaller and 50% are larger), quartiles are descriptive measures that split the ordered data into 4 quarters
how do you compute quartiles
Computer the quartiles of the 3 year annualized returns after removing CI signature Select Canadian Seg I. The ordered array is:
5.34 6.15 6.85 7.11 9.05 10.16 10.79 11.35 13.43 13.43 13.93 17.1
Solution:
Q1 = (n+1)/4 ordered observation
= 13 + 1 / 4 = 3.5 ordered observation
Step 2: Q1 is approximated by using the arithmetic mean of the third and fourth ordered observations
Q1 = 6.85 + 7.11 / 2 = 6.98
In addition:
Q3 = 3(n+1)/4 ordered observation
3(13+1) /4 = 10.5 ordered observation
Therefore, using rule 2, Q3 is approximated by the arithmetic mean of the 10th and the 11 ordered observation
Q3 = 13.43 + 13.43 /2 = 13.43
What are the measures of variation
- range
- variance
- standard deviation
- coefficient of variation
describe range
difference between the largest and the smallest observation
- ignores the way in which data are distributed
what is interquartile range
- measure of variation
- also called mid-spread (spread in the middle 50%)
- not affected by extreme values
How do you calculate interquartile range
difference between the first and third quartiles
what is variance
- important measure of variation
- shows variation about the mean
how do you calculate the sample variance
sum of the squared differences around the arithmetic mean divided by the sample size minus 1
what is standard deviation
- most important measure of variation
- shows variation about the mean
- has the same units s the original data
- most practical and most commonly used measure of variation
which measure of variation is most important
standard deviation
how do you calculate standard deviation
square root of the sum of the squared differences around the arithmetic mean divided by the sample size minus
What is coefficient of variation (CV)
- measures relative variation
- expressed as %
- higher value indicates greater variability relative to the mean
- used to compare two or more sets of data measures in different units
- measures the scatter in the data relative to the mean
what is the calculation for coefficient of variation
CV = (standard deviation/mean) 100%
What is shape of a distribution
- describes how data is distributed
- measures shape
- can be symmetric or skewed
If the mean and median are equal the shape will be
symmetric (or zero skewed)
if the mean exceeds the median, the shape is
Right Skewed
- the variable is called positive or right skewed
if the median exceeds the mean the shape is
called left-skewed
- also called negative
how does positive skews happen
when the mean is increased by some unusually high values
how does negative skews happen
when the mean is reduced by some extremely low values
How is that variables are symmetrical (shapes)
when there are no really extreme values (low and high values balance each other)
What is the 5 number summary used for
to determine the shape of a distribution
what does the 5 number summary include
- smallest value
- first quartile (Q1)
- the second quartile (Q2)
- the third quartile (Q3)
- the largest number
what is used to display data using 5-number summary
box-and-whisker plot
using the 5 number summary to recognize symmetry in data
- the distance from x smallest to the median = the distance form the median to x largest
- the distance form x smallest to Q1 equals the distances form Q3 to x largest
5-number summary what dos the right-skewed distribution mean
the distance from the median to x largest is greater than the distance form the x smallest to the median
also
the distance form Q3 to x largest is greater than the distance form x smallest to Q1
5 -number summary what does the left skewed distribution
the distance from x smallest to the median is greater than the distance form the median to x largest
also
the distance form the x smallest to Q1 is greater than the distance form Q3 to x largest
what does the coefficient of correlation measure
measures the strength of the linear relationship between two quantitative variables
how do you calculate coefficient of correclation
??
what are the features of correlation coefficient
- unit free
- ranges between -1 and 1
- the closer to -1, the stronger the negative linear relationship
- the closer to 1, the stronger the positive linear relationship
- the closer to 0 the weaker any positive or negative linear relationship
what is the data analysis objective
should report the summary measures that best meet the assumptions about the data set
what are the pitfalls in numerical descriptive measures
- data analysis is objective
2. data interpretation is subjective (should be done in fair, neutral and clear manner)
What are some ethical considerations for numerical descriptive measures
- should document both good and bad results
- should be presented in fair, objective and neutral manner
- should not use inappropriate summary measures to distort facts
what is central tendency
??