Chapter 3 Flashcards
Descriptive Statistics
Summarize or describe relevant characteristics of data
Mean
Average
Σx
Sum of all data values
Median
Middle value when all data are set in numerical order (count)
Mode
Value that occurs with the greatest frequency in a data set
Bimodal
Two modes
Multimodal
More than 2 modes
No mode
No data value is repeated
Midrange
Largest value+minimum data value/2
Rounding rule
Round to one place greater than the data
Nominal level data
Doesn’t make sense to measure center numbers
ranks, zip codes, things that aren’t measurements
Mean from a frequency distribution
Sum of all class midpoints/sum of frequencies x̅=Σ(f*x)/Σf
x̅
Mean
f
Frequencies
x
Class midpoint for frequency distribution, value in weighted mean, frequencies in s, magic in σ
Weighted Mean
data contributes more significance than another: break it down
Weighted mean formula
x̅=Σ(w*x)/Σw
Skewed distribution
Data plot is more on one side than the other
Skewed to the left
Negatively skewed
Skewed to the right
Positively skewed
Symmetric Data
Zero skewness: mean, median, mode are same
Range
Largest value-smallest value
Standard Deviation for a sample (s)
Measure of variation of values about the mean.
s=√ nΣ(x^2)-(Σx)^2/n(n-1)
n=#values
x=frequencies
Standard Deviation for a population (σ)
Measure of variation of values about the mean. σ=√Σ(x-μ)^2/N N=pop. size μ=mean of pop x=some magic # you pull out of your ass
Variance
s^2, σ^2 s^2 tends to be close to σ^2, making s^2 an unbiased predictor of σ^2. But difficult to understand caz different that original unit.
Rule of thumb
95% of data lies between 2 SD of the mean
Estimate Min & max data values
x̅-(2s), x̅+(2s)
Estimate SD
s=range/4
Empirical rule for bell shaped
68% of data falls within 1SD of mead, 95% 2SD, 99.7% 3SD
Chebyshev’s Theorum
For any distribution the proportion of data values lying with K SD of the mean is always at least 1-1/K^2, where K is any positive #>1
K=SD from mean
s
sample SD
σ
Pop. SD
s^2
Sample variance
σ^2
Pop. variance
«_space;SD
Values in data set are close together
> > SD
Values in data set have large variation
Z score (standardized value)
The # of SD that a given value x is above or below the mean.
z=x-x̅/s or z=x-μ/σ
Usual z scores
-2 < Z SCORE < or equal to 2
Unusual data is called outlier data
Percentiles
Relative position of a data value compared to the data set in 100 groups. Data is _% BELOW a #
Percentile of x equation
x=100(#values below x)/(total# values)
Quartiles
Divides group into 4 parts Q1=P25, Q2=P50, Q3+P75
Interquartile range (IQR)
IQR=Q3-Q1
5 number summary box plot
Minimum, Maximum, median, Q1, Q3
Outliers
Data above Q3 or below Q1 by an amount > 1.5 IQR
Estimate range
min=x̅-(2s), max=x̅+(2s) OR range=s*4
Coefficient of variation
s/x̅100, σ/μ100 described sd relative to mean