Statistics Flashcards
Central Limit theorem
When you repeatedly sample from an underlying population with unknown characteristics: the distribution of sample means will approximate the normal distribution (if the sample size is sufficiently large, typically > 30)
The distribution of sample means follows approximately normal distribution for sufficiently large samples.
The law of large numbers
Given any random process, the difference between sample mean and underlying population mean decreases as number of samples increases (observed probability approaches theoretical)
The larger your sample, the closer your sample mean is to the population mean.
Inferential statistics
Inferring the characteristics of a population given a particular sample (for any sufficiently large sample we can estimate the mean of the underlying population from the sample mean)
Standard deviation
It is a statistic that tells us how much individual values in a data set differ from the mean of that set. It measures the spread, or variability, of a set of numbers.
Standard error of the mean
It measures the precision of the sample mean as an estimate of the population mean. It tells us how much the sample mean (the average of our sample data) is likely to differ from the true population mean (the average of all possible data points if we could measure them all). It is the standard deviation divided by the square root of the sample size.
Z-score
A z-score, also known as a standard score, tells us how far a particular data point is from the mean in terms of standard deviations. It helps us understand how unusual or typical a particular value is within a data set. It is calculated by taking a specific data point and substracting the sample mean and then deviding it by the standard deviation. Getting a Z-score around suggests its kinda normal data point.
Confidence intervals
For sample indicate the interval in which there is a 95% likelihood that the population mean falls
QQ-plot
A QQ-plot, or quantile-quantile plot, is a type of plot used to compare the distribution of a data set to a theoretical distribution, most commonly a normal (bell curve) distribution. It’s a helpful visual tool for checking whether your data is normally distributed, which is often an important assumption in statistics.
Parametric tests
Can be done on normally distributed data
Non-parametric tests
Can be done on non-normally distributed data
Quasi-experiment
Collection of data of 2 or more naturally occurring variables in the world (e.g. shoesize and breathhold) - no random assignment of subjects!
A full experiment
Systematic manipulation of variables (Independent variables) to observe how they influence an outcome measure (Dependent variable)
T-test
When we want to test if two means are different
Regression
When we want to predict a continuous dependent variable from one or more
continuous OR categorical independent variables
Correlation test
When we want to test the relation between two continuous variables