24. Statistical Inference Flashcards

1
Q

statistical inference

A
  • using a sample to make statements about the population
  • needed because we can’t measure effect in entire populations
  • process of drawing conclusions about effects in a population
  • using data on a sample drawn from that population, PATTERNS revealed through analysis of sample data -> generalized to population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why do we use random sampling?

A
  • each member of the population has an equal chance of being chosen
  • study sample is representative of population
  • provides greatest probability that findings in the sample will closely approximate the overall population
  • findings can be “generalized”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the three main explanations for any observed effect?

A
  1. the effect is due to bias or confounding
  2. the effect is due to chance (this is where statistics comes in)
  3. the effect is real
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

bias vs confounding

A

bias is a systematic error in the design or implementation of a study: creates an association which is not true

confounding is an association that is true, but potentially misleading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

hypothesis testing

A

involves choosing between 2 propositions:

  1. null hypothesis : no real difference b/w groups, observed effect is due to chance
  2. alternate hypothesis: real difference exists between groups

we are looking to “reject” the null hypothesis (we want to show that the observed effect is greater than what we would expect based on chance alone)

a null hypothesis may occur when you find an observed effect in the sample population, but there is no effect on the entire population (that you are trying to represent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

P value

A
  • the probability that the observed effect could be due to chance alone
  • probability of obtaining the results if the null hypothesis were true
  • P value of 0.05 means that there is only a 5% probability of obtaining observed result if it wasn’t real (but a 95% probability that the observed result is real and representative of the entire population)
  • the lower the P value, the stronger the evidence found
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

P value and significance

A
  • if p-value is less than a certain value (0.05) we conclude that chance alone is unlikely to explain the effect we see
  • therefore we REJECT the null hypothesis
  • we can call result “statistically significant”
  • 0.05 is called the alpha level (so P
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P > 0.05

A
  • does NOT mean that there is no difference between the two groups or that the two are equivalent
  • it just means that we are unable to rule out that chance explains the effect we observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

confidence intervals

A
  • estimated range of values likely to include an unknown population effect
  • the “level” of confidence is the probability that the interval produced by a statistical method includes the true value of the population effect (usu 95%)
  • I.e. there is an X% chance that the range will cover the true mean; accounting for variability from sample to sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

null value for difference in means? relative risk? odds ratio?

A

difference in means = 0
RR = 1
OR = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

confidence intervals around differences between groups?

A
  • if the confidence interval does NOT contain the NULL value, then we can say with X% confidence that the observed effect is not due to chance alone
  • –then the result is statistically significant (P
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

confidence interval vs P-value?

A

confidence interval is more informative than a p-value

in addition to statistical significance (given by P value), CI also gives you an idea of how large or how small the effect is likely to be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the two types of quantitative variables?

A

continuous (measurement) and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the two types of qualitative variables?

A

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what type of variable is age?

A

continuous (quantitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what type of variable is a score of 1-5?

A

discrete (quantitative) (it is a number, but not continuous)

17
Q

what type of variable is sex?

A

nominal (qualitative)

18
Q

what type of variable is age category?

A

ordinal (qualitative) (some ordering of things)

19
Q

what types of variables are categorical?

A

discrete, nominal, ordinal

20
Q

what are dichotomous variables?

A

a type of categorical variable that is binary (eg have outcome or not)

21
Q

what descriptive statistic tests can be used for continuous variables?

A
  • mean/median (for center or average)

- variance, range (for spread or distribution)

22
Q

what inference/hypothesis testing can be used for continuous variables?

A
  • student’s t-test (for difference in mean of 2 groups)
  • analysis of variance (for difference between more than 2 groups)
  • confidence interval around mean difference
23
Q

variance and standard deviation are what?

A

statistics that tell you how tightly data are clustered around the mean:

  • variance is the average squared distance between the data and the mean
  • standard deviation is the square root of the variance
  • when the data are pretty tightly bunched together, the standard deviation is small
  • when the data are spread apart, you have a relatively large standard deviation
24
Q

standard error

A

the standard error of the mean tells us how VARIABLE these means are likely to be from one sample to the other (if you were to do repeated sampling)

  • SE = SD/sqrt(N)
  • if SE is small, we would expect a similar mean if we were to repeat our study (mean is precisely estimated)
  • if SE is large, we would expect a different result if we were to repeat our study (mean is not preceisely estimated)
25
Q

______ is the basis of calculating the t-test statistic and confidence intervals.

A

SE

26
Q

t-test

A

a ratio between the observed effect (difference in means) and the standard error of the effect (variability in the means from sample to sample)

  • so we want mean difference to be high and variability (SE) to be small
  • we compare the observed t-statistic to a critical value to determine statistical significance
27
Q

what type of descriptive statistics can be used for categorical variables?

A

frequencies, proportions, %

28
Q

what type of inference/hypothesis testing can be used for categorical variables?

A
  • Chi-square test and p-value
  • fisher’s exact test and p-value (for small sample sizes(
  • RR and CIs
  • OR and CIs
29
Q

Chi-square test

A
  • chi-square statistic and associated P-value
  • used to test significance of categorical data (2X2 data, 2X3 data, 3X3 data, etc)
  • asks how much do the data we observe differ from whta would be expected under the null hypothesis?
  • similar to t-statistic, compare the observed chi-square statistic to a critical value to determine statistical significance
30
Q

what is the conceptual formula of the t-test? and what does it do?

A

mean difference/measure of variability in means

tells you difference between two means

31
Q

what is the conceptual formula for analysis of variance (F statistic)? and what does it do?

A

variance between groups/variance within groups

tells you difference among many means

32
Q

what is the conceptual formula for the chi-square test? and what does it do?

A

extent to which frequencies are not consistent with the null hypothesis/size of sample

tells you differences in frequencies

33
Q

power

A

the ability to detect a difference between study groups when one does exsit

depends on:

  • sample size
  • actual or true difference between groups (usually inversely related to sample size)
  • level of statistical significance (usu set at 0.05)

power analysis should be performed a priori (if you read a study where results were “not significant” and power for that outcome was not reported, consider the possibility that the study was underpowered

34
Q

how do you maximize the power of a study?

A
  • ensure adequately sized sample of study subjects

- choose the most precise and accurate measures of exposure and outcome (reduces variance of the measurements)

35
Q

statistical significance vs clinical significance

A
  • statistics tell you whether a result is statistically significant, but not whether the result is clinically important
  • small effect with a “significant” p-value might not be clinically significant (because it would require a large population for the intervention to have an effect)
  • a large effect with a “non-significant” p-vale might be clinically significant (if sample size was small or if supported by other studies/biological plausibility)