BA Chapter 4 Flashcards
Arithmetic mean (mean)
Sum all observations and divide by the total number of observations. (u)
Median
The middle value of a ranked data set. Half the data are above, half are below the median.
Coefficient of kurtosis 峰态系数
- Kurtosis refers to peakedness or flatness of the curve.
- Coefficient of Kurtosis =KURT(data, range)
- CK < 3 indicates the data is somewhat flat with a wide degree of dispersion.
- CK > 3 indicates the data is somewhat peaked with less dispersion.
Coefficient of skewness 歪斜系数
- Skewness describes lack of symmetry
- Coefficient of Skewness =SKEW(data, range)
- CS is negative for left-skewed data.
- CS is positive for right-skewed data.
- |CS| > 1 suggests high degree of skewness.
- 0.5 ≤ |CS| ≤ 1 suggests moderate skewness.
- |CS| < 0.5 suggests relative symmetry.
Skewness
- Skewness describes lack of symmetry.
- Coefficient of Skewness =SKEW(data, range)
- CS is negative for left-skewed data.
- CS is positive for right-skewed data.
- |CS| > 1 suggests high degree of skewness.
- 0.5 ≤ |CS| ≤ 1 suggests moderate skewness.
- |CS| < 0.5 suggests relative symmetry.
Relationship between variables is measured by:
- Covariance
- Correlation
- Covariance is a measure of the linear association between two variables, X and Y. Depends upon units of measurement, so difficult to interpret.
- Correlation is a measure of the linear association between two variables, X and Y. Does not depend upon units of measurement. Known as the: Pearson product moment correlation r.
- r represents correlation coefficient 相关系数
- Relationships can also be visualized with a Scatterplot. Scatterplot is the only graph that shows if a relationship exists between two variables.
Quiz: What coefficient measures the linear relationship between two variables?
Correlation of Variation (CV).
Dispersion 散布 分散
The degree of variation in the data, i.e., the numerical spread of the data. Several statistical measures characterize dispersion: the range (max minus min), variance, and standard deviation (square root of the variance).
Interquartile range
- The difference between the first and third quartiles, Q3-Q1, is often called the interquartile range, or the midspread.
- NOTE:
The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median (the middle value) of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set.
Empirical rules
If a data set has an approximately bell-shaped relative frequency histogram, then:
- About 68% of the observations lie within one standard deviation of the mean.
- 95% of the observations lie within two standard deviations of the mean.
- 99.7% of the observations lie within three standard deviations of the mean.
Chebyshev’s Theorem
- The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of approximations. A result that applies to every data set is known as Chebyshev’s Theorem.
- For any numerical data set:
- At least 3/4 of the data lie within two standard deviations of the mean.
- At least 8/9 of the data lie within three standard deviations of the mean.
- At least 1−1/k2 of the data lie within k standard deviations of the mean
Coefficient of variation & Return to risk
- The coefficient of variation (CV)
provides a relative measure of dispersion. - The return to risk = 1/CV.
- Return to risk provides a relative
measure of risk with respect to the
return. - Easier to compare than the
standard deviation. - The smaller the CV, the less the risk.
- The larger the Return to Risk,
the better the return with respect to
the risk involved.
Range
Highest value (Maximum) minus the lowest value (Minimum).
Standard deviation (σ)
The square root of the Variance.
Variance (σ2)
An overall measure of how far each value is from the mean.
- An average of the squared deviations from the mean.
- Units are squared.
(square v. 使成平方)