week 4 -- SD as a ruler & z-scores Flashcards
Mean (equation)
mean y = sum of y / n
ȳ = sigma y / n
descriptive vs. inferential
descriptive comes from our sample
inferential are statements about the population
–>
to make inferences about a population, I have to construct a model of that population
Statistic
item of numerical info about the SAMPLE
Paramater
item of numerical info about the MODEL (i.e., the POPULATION)
Estimator
a statistic used to estimate a parameter (e.g., sample mean)
Error
NON-SYSTEMATIC difference btw estimator and parameter
Bias
SYSTEMATIC difference btw estimator and parameter
Standard deviation – a measure to quantify spread of a sample (or population)
Allows us to answer: “How remarkable is a single observed value”?
algebraically = square root of variance
(square root of Σ (y- ȳ)2 / n
or with Bessel’s correction: Σ (y- ȳ)2 / n - 1
shows how close a data point is to the mean of the sample – BUT observations in a sample are always closer to their own mean than to the population mean. SO uncorrected SD is a biased estimator (OK as purely descriptive statistic)
What is the trick for comparing performance btw very different-looking values (e.g., meters run vs. time ran)?
Standard deviation!
(use as a “ruler” to measure distance from the mean)
expressing distance with SD “standardizes” the performances
z-score 1
allows us to compare apples and oranges (eliminates units)
letter z denotes values that have been standardized!!
(with mean & SD)
z = y - ȳ / s
z-score = performance - mean performance / standard deviation
z-score 2
Comparsion shows us which score is more extraordinary
z-scores have NO UNITS
they tell us how far the data is from the mean
2 = 2 SD above the mean
-1.5 = 1.5 SD below the mean
shifting data
plus or minus
Only measures of position change (center, min, max)
Neither shape nor spread changes (range, IQR, SD)
rescaling data
multiply or divide
all measures of position (mean, median) and spread change
shape remains constant
standardizing into z-scores shifts data by the mean and rescales them by the standard deviation
Shape stays constant center changes (mean = 0) spread changes (SD = 1)
A statistical model is always wrong. Explain.
it is “wrong” in the sense that it doesn’t match reality exactly
Normal model
a way to show how extreme a z-score is
N (μ, σ)
(Normal model
mew μ, sigma σ
μ = mean of a Normal model
σ = standard deviation of a Normal model
Greek letters
NOT numerical summaries of data – they are part of the model, parameters
Latin letters
summaries of data, statistics
standardized data for model
z = y - μ / σ (for parameters)
cf.
z = y - ȳ / s (for statistics)
68 - 95 - 99,7 rule
In a Normal model:
68% of values fall within 1 SD of the mean
95% of values fall within 2 SD of the mean
99,7% of values fall within 3 SD of the mean