- the probability that the observed effect could be due to chance alone - probability of obtaining the results if the null hypothesis were true - P value of 0.05 means that there is only a 5% probability of obtaining observed result if it wasn't real (but a 95% probability that the observed result is real and representative of the entire population) - the lower the P value, the stronger the evidence found

- does NOT mean that there is no difference between the two groups or that the two are equivalent - it just means that we are unable to rule out that chance explains the effect we observed

- estimated range of values likely to include an unknown population effect - the "level" of confidence is the probability that the interval produced by a statistical method includes the true value of the population effect (usu 95%) - I.e. there is an X% chance that the range will cover the true mean; accounting for variability from sample to sample

24. Statistical Inference Flashcards by Claire Rosen

statistical inference

using a sample to make statements about the population
needed because we can’t measure effect in entire populations
process of drawing conclusions about effects in a population
using data on a sample drawn from that population, PATTERNS revealed through analysis of sample data -> generalized to population

How well did you know this?

Not at all

Perfectly

why do we use random sampling?

each member of the population has an equal chance of being chosen
study sample is representative of population
provides greatest probability that findings in the sample will closely approximate the overall population
findings can be “generalized”

How well did you know this?

Not at all

Perfectly

what are the three main explanations for any observed effect?

the effect is due to bias or confounding
the effect is due to chance (this is where statistics comes in)
the effect is real

How well did you know this?

Not at all

Perfectly

bias vs confounding

bias is a systematic error in the design or implementation of a study: creates an association which is not true

confounding is an association that is true, but potentially misleading

How well did you know this?

Not at all

Perfectly

hypothesis testing

involves choosing between 2 propositions:

null hypothesis : no real difference b/w groups, observed effect is due to chance
alternate hypothesis: real difference exists between groups

we are looking to “reject” the null hypothesis (we want to show that the observed effect is greater than what we would expect based on chance alone)

a null hypothesis may occur when you find an observed effect in the sample population, but there is no effect on the entire population (that you are trying to represent)

How well did you know this?

Not at all

Perfectly

P value

the probability that the observed effect could be due to chance alone
probability of obtaining the results if the null hypothesis were true
P value of 0.05 means that there is only a 5% probability of obtaining observed result if it wasn’t real (but a 95% probability that the observed result is real and representative of the entire population)
the lower the P value, the stronger the evidence found

How well did you know this?

Not at all

Perfectly

P value and significance

if p-value is less than a certain value (0.05) we conclude that chance alone is unlikely to explain the effect we see
therefore we REJECT the null hypothesis
we can call result “statistically significant”
0.05 is called the alpha level (so P

How well did you know this?

Not at all

Perfectly

P > 0.05

does NOT mean that there is no difference between the two groups or that the two are equivalent
it just means that we are unable to rule out that chance explains the effect we observed

How well did you know this?

Not at all

Perfectly

confidence intervals

estimated range of values likely to include an unknown population effect
the “level” of confidence is the probability that the interval produced by a statistical method includes the true value of the population effect (usu 95%)
I.e. there is an X% chance that the range will cover the true mean; accounting for variability from sample to sample

How well did you know this?

Not at all

Perfectly

null value for difference in means? relative risk? odds ratio?

difference in means = 0
RR = 1
OR = 1

How well did you know this?

Not at all

Perfectly

confidence intervals around differences between groups?

if the confidence interval does NOT contain the NULL value, then we can say with X% confidence that the observed effect is not due to chance alone
–then the result is statistically significant (P

How well did you know this?

Not at all

Perfectly

confidence interval vs P-value?

confidence interval is more informative than a p-value

in addition to statistical significance (given by P value), CI also gives you an idea of how large or how small the effect is likely to be

How well did you know this?

Not at all

Perfectly

what are the two types of quantitative variables?

continuous (measurement) and discrete

How well did you know this?

Not at all

Perfectly

what are the two types of qualitative variables?

nominal and ordinal

How well did you know this?

Not at all

Perfectly

what type of variable is age?

continuous (quantitative)

How well did you know this?

Not at all

Perfectly

what type of variable is a score of 1-5?

Study These Flashcards

discrete (quantitative) (it is a number, but not continuous)

what type of variable is sex?

Study These Flashcards

nominal (qualitative)

what type of variable is age category?

Study These Flashcards

ordinal (qualitative) (some ordering of things)

what types of variables are categorical?

Study These Flashcards

discrete, nominal, ordinal

what are dichotomous variables?

Study These Flashcards

a type of categorical variable that is binary (eg have outcome or not)

what descriptive statistic tests can be used for continuous variables?

Study These Flashcards

mean/median (for center or average)

- variance, range (for spread or distribution)

what inference/hypothesis testing can be used for continuous variables?

Study These Flashcards

student’s t-test (for difference in mean of 2 groups)
analysis of variance (for difference between more than 2 groups)
confidence interval around mean difference

variance and standard deviation are what?

Study These Flashcards

statistics that tell you how tightly data are clustered around the mean:

variance is the average squared distance between the data and the mean
standard deviation is the square root of the variance
when the data are pretty tightly bunched together, the standard deviation is small
when the data are spread apart, you have a relatively large standard deviation

standard error

Study These Flashcards

the standard error of the mean tells us how VARIABLE these means are likely to be from one sample to the other (if you were to do repeated sampling)

SE = SD/sqrt(N)
if SE is small, we would expect a similar mean if we were to repeat our study (mean is precisely estimated)
if SE is large, we would expect a different result if we were to repeat our study (mean is not preceisely estimated)

______ is the basis of calculating the t-test statistic and confidence intervals.

t-test

a ratio between the observed effect (difference in means) and the standard error of the effect (variability in the means from sample to sample) - so we want mean difference to be high and variability (SE) to be small - we compare the observed t-statistic to a critical value to determine statistical significance

what type of descriptive statistics can be used for categorical variables?

frequencies, proportions, %

what type of inference/hypothesis testing can be used for categorical variables?

- Chi-square test and p-value - fisher's exact test and p-value (for small sample sizes( - RR and CIs - OR and CIs

Chi-square test

- chi-square statistic and associated P-value - used to test significance of categorical data (2X2 data, 2X3 data, 3X3 data, etc) - asks how much do the data we observe differ from whta would be expected under the null hypothesis? - similar to t-statistic, compare the observed chi-square statistic to a critical value to determine statistical significance

what is the conceptual formula of the t-test? and what does it do?

mean difference/measure of variability in means tells you difference between two means

what is the conceptual formula for analysis of variance (F statistic)? and what does it do?

variance between groups/variance within groups tells you difference among many means

what is the conceptual formula for the chi-square test? and what does it do?

extent to which frequencies are not consistent with the null hypothesis/size of sample tells you differences in frequencies

power

the ability to detect a difference between study groups when one does exsit depends on: - sample size - actual or true difference between groups (usually inversely related to sample size) - level of statistical significance (usu set at 0.05) power analysis should be performed a priori (if you read a study where results were "not significant" and power for that outcome was not reported, consider the possibility that the study was underpowered

how do you maximize the power of a study?

- ensure adequately sized sample of study subjects | - choose the most precise and accurate measures of exposure and outcome (reduces variance of the measurements)

statistical significance vs clinical significance

- statistics tell you whether a result is statistically significant, but not whether the result is clinically important - small effect with a "significant" p-value might not be clinically significant (because it would require a large population for the intervention to have an effect) - a large effect with a "non-significant" p-vale might be clinically significant (if sample size was small or if supported by other studies/biological plausibility)

24. Statistical Inference Flashcards

(35 cards)