Ch 3 Concepts+R Flashcards
Sample mean or avg is denoted as….
x bar (x with line above it)
population mean is denoted as
mu
list of #s denoted as….
x_1,x_2,…..,x_n
summation notation
summation, i=1 at bottom, to n, x_i=x_1+x_2…..+x_n
Little n vs big N
n=sample size, N=pop size
Mean, if sample would be…..(equation)
x bar=(x_1+x_2…..+x_n)/n= (summation)/n
Mean if population would be…..(equation)
mu=(x_1+x_2…..+x_n)/N=(summation)/N
When can mean value be used?
mean score of exam, weighted GPA, miles p/gallon
Sample means used to…..
estimate population means (larger sample, better estimate)
Median is
the number in the middle ( if numbers even, average mean of two middle numbers)
Advantage of median compared to mean
median less sensitive to outliers (robust), mean includes entire dataset (more applicable)
mode
value in data set that appears most frequently (applies to both qualitative and quantitative)
Range
difference between largest value and smallest value
Variance
how far values in data set are away from center (mean)
population variance is denoted by….
sigma squared
population variance formula
sigma squared=(summation (X_i-mu)^2)/N
Sample variance is denoted by….
s^2
sample variance formula
s^2=(summation (x_i-x bar)^2/(n-1)
R-Var command in R finds….
unbiased sample variance
sample standard deviation (s) is the…
square root of sample variance
population standard deviation (sigma) is the…
square root of population variance
Why do we care about standard deviation when we have variance?
variance has squared units so, so we need the square root of variance to get the same unit
Z-scores tells us…
how many standard deviations that value is away from the pop mean
z-score equation
(x-mu)/sigma –> (x is the #, mu is the mean, sigma is the sd)
Z-score can be_____ but sd is _______
negative, positive
Quartile 1
a # larger than 25% at all values but smaller than 75% (ls, sl——let’s silly)
Quartiles 2
median
Quartile 3
a # smaller than 25% all of the values but greater than 75% of all values (sl, ls)-sillies
Is there a difference between R code quartile and hand counted?
Yes, due to different algorithm it uses
Percentiles
generalization of quartiles (1 to 99)
Outliers and 2 types
values larger or smaller than majority (correct values or error) (ex, Elon Musk’s salary vs typo)
Inter quartile range (IQR)
Q3-Q1
Formulas that determine outliers
Q1-1.5IQR
Q3+1.5IQR