CFAI - Formula - Org., Vis., and Desc. Data Flashcards
Data type: NOIR
Categorical
N - nominal. No logical order.
O - ordinal. Has logical order or rank.
Numerical
I - integer. Discrete > limited to a finite number of values.
R - ratio. Continuous > can take on any value within a range,
Expected Value i,j
(Total Row i × Total Column j)/Overall Total
Definition of Arithmetic Mean.
The arithmetic mean is the sum of the values
of the observations divided by the number of observations.
Sample Mean Formula
Definition of Median
The median is the value of the middle item of a set of items that has been sorted into ascending or descending order. In an odd-numbered sample of n items, the median is the value of the item that occupies the (n + 1)/2 position. In an even-numbered
sample, we define the median as the mean of the values of items occupying the n/2 and (n + 2)/2 positions (the two middle items).
Mean. Even number of observations.
If a sample has an even number of observations, the median is the mean of the two
values in the middle. For example, if our sample in Exhibit 38 had 12 indexes instead
of 11, the median would be the mean of the values in the sorted array that occupy
the sixth and the seventh positions.
Definition of Mode
The mode is the most frequently occurring value in a
distribution.
Weighted Mean Formula. The weighted mean Xw (read “X-bar sub-w”), for a
set of observations X1, X2, …, Xn with corresponding weights of w1, w2, …, wn,
is computed as:
Geometric Mean Formula. The geometric mean, XG , of a set of observations
X1, X2, …, Xn is:
ln
Geometric Mean Formula. The geometric mean, XG , of a set of observations
X1, X2, …, Xn is:
Alternative ln
An equation that summarizes the calculation of the geometric mean return, RG
Geometric Mean Return Formula. Given a time series of holding period
returns Rt, t = 1, 2, …, T, the geometric mean return over the time period
spanned by the returns R1 through RT is:
Harmonic Mean Formula. The harmonic mean of a set of observations X1, X2,
…, Xn is:
Deciding Which Central Tendency Measure to Use
The formula for the position (or location) of a percentile in an array with n entries sorted in ascending order is:
Quantiles
When the location, Ly, is a whole number
the location corresponds to an actual
observation. For example, if we are determining the third quartile (Q3) in a
sample of size n = 11, then Ly would be L75 = (11 + 1)(75/100) = 9, and the third
quartile would be P75 = X9, where Xi is defined as the value of the observation
in the ith (i = L75, so 9th), position of the data sorted in ascending order.
Quantiles
When Ly is not a whole number or integer
Ly lies between the two closest integer numbers (one above and one below), and we use linear interpolation between those two places to determine Py. Interpolation means estimating an unknown value on the basis of two known values that surround it (i.e., lie above and below it); the term “linear” refers to a straight-line estimate.
interquartile range
interquartile range is the difference between the lowest
value in the second quartile and the highest value in the third quartile
Whisker plot
upper fence / lower fence
Upper fence: (1,5* IQR) + upper bound
Lower fence = lower bound - (1,5*IQR)
Definition of Range. The range is the difference between the maximum and
minimum values in a dataset:
Range = Maximum value − Minimum value
Mean Absolute Deviation Formula. The mean absolute deviation (MAD) for a
sample is:
Here’s how to calculate the mean absolute deviation.
Step 1: Calculate the mean.
Step 2: Calculate how far away each data point is from the mean using positive distances. These are called absolute deviations.
Step 3: Add those deviations together.
Step 4: Divide the sum by the number of data points.
Sample Variance Formula. The sample variance, s2, is:
Sample Standard Deviation Formula. The sample standard deviation, s, is:
Steps to Calculate Sample Standard Deviation and Variance
Sample Target Semideviation Formula. The target semideviation, sTarget, is:
To calculate a sample target semideviation, we first specify
the target. After identifying observations below the target, we find the sum of the
squared negative deviations from the target, divide that sum by the total number of
observations in the sample minus 1, and, finally, take the square root.
Calculating Target Downside Deviation
Coefficient of Variation Formula. The coefficient of variation, CV, is the ratio
of the standard deviation of a set of observations to their mean value:
Higher = higher dispersion
Lower = lower dispersion
Summarizing kurtosis:
Relation aritmetic mean and geometric mean (MM)
s² = variance
Skew
Skew = o
mean = median = mode
Skew > 0
mean > median > mode
Skew < 0
mean < median < mode
(tip: alphabetical order / arrows same direction)
Excess kurtosis
alphabetical ordem
Kurtosis summary CFAI
Kurtosis summary MM
Sign convention MAD
In calculating MAD, we ignore the signs of the deviations around the mean. For
example, if Xi = −11.0 and X = 4.5, the absolute value of the difference is |−11.0 − 4.5|
= |−15.5| = 15.5