Lesson 2.2: Numerically Summarizing Data Flashcards
Central Tendencies and Variability
population mean
- denoted by π
- population size = π
sample mean
- denoted by π₯ with line over it
- sample size = π
- R: mean(DataTable$Column)
median
- middle number of data in ascending order
- if odd number of observaions: middle number
- if even number of observations: mean of two middle numbers
- not greatly affected by outliers
- R: median(DataTable$Column)
mode
- most frequent observation
- R: Mode <- function(x) {
ux unique(x)
ux [which.max (tabulate(match(x, ux)))]
} - histogram: hist (data,breaks=seq (low, high, interval))
Range
(R)
- difference between largest and smallest value
-** R: range(DataTable$Column)** shows lowest and highest values
population variance
- denoted by Ο2
- sum of the squared deviations about
the population mean (x-ΞΌ)2 - Divided by number of observations in population (N)
> (n=length(data$Score))
(x =data$Score)
(v.population = sum((x-mean(x))^2) / n)
sample variance
- denoted by s2
- sum of the squared deviations about
the sample mean (x-sample mean)2 - Divided by number of observations in sample minus 1 (n-1)
R: var(DataTable$Column)
R var() only gives sample variance
standard deviation
population and sample
square root of the variance
- denoted by Ο (population) or s (sample)
Coefficient of Variation
(CV)
- measures the scatter in the data relative to mean
- CV = standard deviation (s) / sample mean (x)
- always expressed as percentage
Computing standard deviation
Excel and R
- Excel: = STDEV(data range)
- R: sd(vector)
Empirical Rule
- Approximately 68% data between πβ1π πππ π+1π
- Approximately 95% data between πβ2π πππ π+2π
- Approximately 99.7% data between πβ3π πππ π+3π
left to right percentages under curve:
0.15 + (2.35 + (13.5 + (34 + 34) + 13.5) + 2.35) + 0.15
Standardized values
z value
- Compute Probabilities
- Compare two different distributions
- We compute standardized values
- z = 2 : Data value is 2 standard deviations above the mean
- z = -1.6 : Data value is 1.6 standard deviations below the mean
z = (Data value (y) - mean(ΞΌ)) / Standard deviation (Ο)
p-value
- Represents the area under the normal distribution curve towards left side
- 1.0 z-value = 0.15 + 2.35 + 13.50 + 34.00 + 34.00 = 84% p-value (0.8413)
- area under normal distribution curve = 1
p-value calculation for specific range
example
probability of student scoring b/n 450 and 600 on SAT
mean = 500, sd = 100
z = (600-500)/100 = 1.0 = p-value 0.8413
z = (450-500)/100 = -0.50 = p-value 0.3085
0.8413 - 0.3085 = 0.5328 or 53.28%
Standard normal curve
(π=0, π=1)
- Symmetric about its mean π=0,π=1
- Mean = Median = Mode
- Single peak at z=0
- Inflection point at β1πππ+1
- Area under the curve = 1
- Area of left ( ππππ π=0) = Area of right = Β½
- Follows the Empirical Rule