Practical Statistics Flashcards
Deviations
The difference between the observed values and the estimate of location
errors, residuals
Variance
The sum of squared deviations from the mean divided by n-1 where n is the number of data values
mean-squared-error
Standard Deviation
The square root of the variance
Mean absolute deviation
The mean of the absolute values of the deviations from the mean
l1-norm, Manhattan norm
Sample statistic
A metric calculated for a sample of data drawn from a larger population
Data Distribution
The frequency distribution of individual values in a data set
Sampling distribution
The frequency distribution of a sample statistic over many sample or resamples.
Central limit theorem
The tendency of the sampling distribution to take on a normal shape as sample size rises
Standard error
The variability (standard deviation) of a sample statistic over many samples (not to be confused with standard deviation, which by itself, refers to variability of individual data values)
Bootstrap sample
A sample taken with replacement from an observed data set
powerful tool for assessing the variability of a sample statistic
Resampling
The process of taking repeated samples from observed data; includes both bootstrap and permutation procedures
Confidence level
The percentage of confidence intervals, constructed in the same way from the same population, that are expected to contain the statistic of interest
Interval endpoints
The top and bottom of the confidence interval
Error
The difference between a data point and a predicted or average value
Standardize
Subtract the mean and divide by the standard deviation
z-score
The result of standardizing an individual data point
Standard normal
A normal distribution with mean = 0 and standard deviation = 1
Tail
The long narrow portion of a frequency distribution, where relatively extreme values occur at low frequency
Skew
Where one tail of a distribution is longer than the other
Trial
An event with a discrete outcome (e.g. a coin flip)
Success
The outcome of interest for a trial
“1” (as opposed to “0”)
Binomial
Having two outcomes
yes/no, 0/1, binary
Binomial Trial
A trial with two outcomes
Bernoulli trial
Binomial distribution
Distribution of number of successes in n trials parameterized by p. Can be approximated by normal distribution with large n and p not too close to 0 or 1
Bernoulli distribution
Lambda
The rate (per unit of time or space) at which events occur
Poisson distribution
The frequency distribution of the number of events in sampled units of time or space
Exponential distribution
The frequency distribution of the time or distance from one event to the next event
Weibull distribution
A generalized version of the exponential distribution in which the event rate is allowed to shift over time
Treatment
Something (drug, price, web headline) to which a subject is exposed
Treatment group
A group of subjects exposed to a specific treatment
Control group
A group of subjects exposed to no (or standard) treatment
Subjects
The items (web visitors, patients, etc) that are exposed to treatments
Test statistic
The metric used to measure the effect of the treatment
Null hypothesis
The hypothesis that chance is to blame
Alternative hypothesis
Counterpoint to the null (what you hope to prove)
One-way test
Hypothesis test that counts chance results only in one direction (e.g. B is better than A)