Statistical inference Flashcards
Goal of statistics
To be able to make inferences about a population, but you can never truly know about an entire population
Instead, you must use a sample, and ensure that the sample and inferences are actually generalizable to the population
Sampling variation
Refers to the inevitable differences between our population and our sample.
We want to minimize this as much as possible in order to have a more accurate sample
Have to think about getting best possible estimate in face of sampling variation
Sampling bias
refers to systematic forces with regard to sampling
Ex. only using university students, or people who pick up the phone/agree to do a study over the phone
Frequentist statistics
Frequentist statistics asks: how likely is it that this effect of the manipulation would have occurred if there was no effect in the actual population?
Ie. what is the chance this result was due to sample variation?
Is there another reason another reason this effect has arisen? Manipulation vs. random? What would happen if we did this study multiple times
Determining the answer to this question is the point of significance testing
What is more common Frequentist vs Bayesian stats
Frequentist statistics is more common: t-tests, p-values, null hypothesis
Sampling distribution
generally focuses on the shape of the charted data
Central limit theorem
states that for a large enough sample size, the mean of the sample will be…
Normally distributed
Equal to the population mean
Have equal variance to the population variance (if divided by sq root of sample size)
This is true regardless of the distribution of the actual population
Standard error
the standard deviation of the sampling error
Will get smaller as the sample size gets bigger, therefore can be used as a measure of sampling error variation/precision
Essentially, standard deviation is a measure of the spread of the data, standard error is a measure of the variation of the sampling error
We cannot actually know the true population distribution, but we can use CLT and standard error to make inferences
Confidence intervals:
Using CLT and standard error to infer if sample mean is true to population mean
Through sub-sampling, we are able to make inferences called confidence intervals
After getting the mean of multiple subsamples, you can see what percent of the means fall within one SD on either side of the mean
The confidence error tells us that 95% of repeated measures will fall within the SE of the sample mean
95% CI allows you to estimate that 95% of estimates lie within the specified SD
Bigger sample size equals smaller confidence intervals, because bigger sample size equals smaller SD, therefore the mean is more precise
Null hypothesis (h0)
assumes that results obtained are the result of random chance/sampling variation, and not an effect of the manipulation
Alternative hypothesis (h1)
assumes the results obtained were due to an effect of the manipulation.
p-value and null hypothesis
P-value refers to the probability that a test-statistic this extreme would be obtained given the null is true/by random chance (conditional probability)
If the p-value/the probability of receiving this result by chance is very small, you can reject null hypothesis and assume the result is the effect of the manipulation
I.e., if the p is very small, H0 is probably wrong, so we should reject it
P-value is industry standard/arbitrarily set at 0.05 (if smaller or equal too, usually reject null, if bigger, accept null
How to get p-value
favored tool for hypothesis testing
Essentially creates a null population distribution: what would we expect to see if this study was repeated and there is no real effect?
To get p-value, take standard error and calculate how many SE away this is the sample’s mean from the pop mean
Do this with z-score
Z-score = (Samp-mean-pop mean)/SE
The probability of observing a sample with our mean randomly is our p-value.
If smaller than 0.05, this can be considered statistically significant.
Type 1 error
False positive ie., reject the null despite there being no real effect
Alpha is the probability of a type 1 error
True positive is 1-beta
False positive is alpha
Type 2 error
False negative ie., accept the null despite there being a real effect, can combat through increasing power
Beta is the probability of a type 2 error
True negative is 1-alpha
False negative is beta
Alpha
Alpha is the probability of a type 1 error
Beta
Beta is the probability of a type 2 error
Definition of power
Your ability to find true positive results, also known as sensitivity.
Standard/arbitrarily set at 80%.
Property of a stat test, not a test itself
Low power is a waste of resource because high chance of false negatives, also have a higher chance of overinflated effect sizes (winners curse)
Winners curse
small sample size have higher chance of random large effect sizes being flagged as significant)
Power is dependent on…
sample size, stat test being utilized, sig threshold, and scale effect size you are looking for
While holding the other three factors constant…
One-sided tests have higher power
Larger samples have higher power
Larger effect sizes have higher power
Higher alpha values have higher power
How to calculate sample size needed
We use power to calculate sample size needed, and can reduce alpha and beta with increased sample size
One-sided test
when you do not care/are sure about the direction of the test. group mean > or < a given value
Two sided test
when you are looking at which direction the effect goes in
When doing a two-sided test, the p-value under the curve dictates what is included in critical regions,
i.e., it dictates what values are above or below the significance.
Areas not included are rejection regions.
T-test
Cannot do a z-test because we do not know true SD of population, must do t-test using the sample SD as an estimate
But this t-value doesn’t follow normal distribution, uses t-distribution which is slightly different instead
T-distribution changes depending on degrees of freedom, which is N-1
and accounts for the uncertainty that comes with a smaller sample size
The larger the sample size, the closer the t-distribution gets to normal distribution
One sample t-test
compares the sample mean to the given mean value that is already known
Two sample t-test or independent groups t-test
Compares means from two separate groups, between subjects and assumes…
Data normally distributed
Observations between and within groups are independent of one and other
Variance should be the same in both groups, but only in students’ t-tests, and we usually use the Welsch t-test, which does not assume this.
Paired t-test
compares two means from the same sample, repeated measures/within subjects and assumes…
You are comparing between related groups or matched pairs/measuring the same individual twice
The difference scores should be normally distributed
T-test uses a difference score between variables
Cohens d
measure of effect size between two means
d=0.2 is a small effect
d=0.5 is a medium effect
d=0.8 is a big effect
But these values are arbitrary and should not be interpreted rigidly
Interpret in the context of the study
R: Power test
pwr.t.test(n= # of participants, d = cohens/effect size, sig.level = #, power = #, type = ‘one.sample’ or ‘two.sample’, alternative = ‘two.sided’)
R: Effect size/cohens d
cohensD(group1, group2, method = ‘unequal’)
R: QQ plot
qqnorm(dataset$column)
R: Independent two-sided T.test
t.test(group1, group2, alternative = ‘two.sided’, var.equal = FALSE)
R: Paired t-test
t.test(x= before, y = after, alternative = ‘greater’, paired = TRUE)
R: get vector length
length(column)