Probability/Statistical Significance Flashcards
What are the two ways studies can screw up?
- caused by chance = random error
2. Not caused by chance = bias or systematic error
What deals with random error in studies?
Statistical inference
If a study has a random error, is it likely to happen again if/when the study is repeated?
NO
An error that is inherent to the study method being used and results in a predictable and repeatable error for each observation is labeled a _____ error. What is it due to?
Systematic error due to bias
T/F: If you repeat a study that had a systematic error, it is likely to happen again
TRUE
these errors are not caused by chance and there is no formal method to deal with them.
What tests will estimate the likelihood that a study result was caused by chance?
Tests of statistical inference
**a study result is called “statistically significant” if it is unlikely to be caused by chance
Do if a study is statistically significant, is it clinically significant?
Not necessarily
Those terms have two different meanings
*even very small measures of association that are not large enough to matter can be statistically significant
What is a chance occurrence?
Something that happens unpredictably without discernible human intention or with no observable cause: caused by chance or random variation
What is random variation?
There is error in every measurement. If we measure something over and over again, we will get slightly different measurements each time AND a few measurements may be extreme
What is statistical inference?
Tells us: if we measure something only once, how sure are we that our measurement has been caused by chance
What two methods are used for estimating how much random variation there is in our study and whether our result was likely to have been caused by chance?
- Confidence intervals
2. P-values
_______ estimates how much random variation there is in our measurement
Confidence intervals
-the range of values where the true value of our measurement could be found
_____ are used to estimate whether the measure was likely to have been caused by chance or not
P values
Will small sample sizes have a large 95% Confidence interval or small CI?
What about large sample sizes?
The larger the sample size, the smaller the confidence interval will be = more precise
- small samples have large CIs
- Large samples have small CIs
How do you interpret this statement?
“prevalence of disease was 8% (95% CI: 4%-12%)”
The estimate of the prevalence from the study was 8%, but we are 95% confident that the true prevalence lies somewhere between 4% and 12%
T/F: If the 95% CI for the odds ratio (OR) does NOT include one, the OR is statistically significant
TRUE
Ex: The odds ration was 3 (95% CI: 0.5 - 6)
**since this includes that the OR could have the value of ONE = it is NOT statistically significant
How do you interpret 95% confidence intervals (95% CI) for odds ratios (OR)?
- OR greater than one, 95% CI does NOT include one : Positive association; statistically significant
- OR greater than one, 95% CI includes one : NO association, NOT statistically significant
- OR less than one, 95% CI does NOT include one : Negative association, statistically significant
- OR less than one, 95% CI included one : No association, NOT statistically significant
If the 95% CI for the relative risk (RR) does NOT include one, the RR (is / is not) statistically significant
IS
*remember, when the RR = one, there is no association between the two test groups
How do you interpret a RR greater than one, combined with a 95% CI that does NOT include one?
Positive association
Statistically significant
How do you interpret a RR less than one, combined with a 95% CI that includes one?
No association
Not statistically significant
How do you interpret a RR less than one, combined with a 95% CI that does NOT include one?
Negative association
Statistically significant
T/F: P-value gives you information about the size of the test sample
FALSE
**it also does NOT give you any info about the range that you can expect to find the true value
To be statistically significant, the p-value must be less than _____
- 05
* if the p-value is greater than 0.06 - the association is NOT statistically significant and could have been caused by chance
How do you interpret p-values that are less than 0.05?
We are 95% confident that an association as large as the one in our study was NOT caused by chance
or
We have 95% confidence that an association this large could not have been caused by chance
How do you interpret the following value?
OR or RR or PR = 3.0 (p = 0.02)
Statistically significant. There is an association. We are 95% certain that an OR of 3.0 could NOT have been caused by chance.
T/F: No matter how large the RR or OR; if the p-value is greater than 0.05, we must say there is no association
TRUE
How are p-values calculated?
Using statistical tests - tests for statistical inference:
- Chi-squared test
- Student’s t test
- Correlation
(need to know when/where to use these three tests - do not worry about calculations)
When testing a hypothesis, can you prove something is true, untrue, or both?
Untrue
You cannot prove that something is true
You can’t prove an association is true
But you can prove that either is NOT true –> Hence the use of a Null hypothesis
What is a “Null” hypothesis?
hypothesis that suggests NO association
Used to be proven untrue and rejected - to confirm associations
What is the alternative hypothesis?
The actual research question that we want the answer to (that there is an association)
What values do we used to accept or reject the null hypothesis?
P values OR Confidence intervals (CIs)
If a p-values is less than 0.05, do we accept or reject the null hypothesis?
We will REJECT the null hypothesis
this means that there is an association - the alternative hypothesis is accepted
What must a p-value be to accept the null hypothesis?
Greater than 0.05
Accepting the null hypothesis means that there is no association - the alternative hypothesis is therefore rejected
What is a type I error?
False positive: rejecting the null when it is NOT false (no association exists)
This is set at 0.05 (95% CI)
Simply put - *saying there is an association, when really there is not
What is a type II error?
False negative: not rejecting the null when it is false (an association truthfully exists)
This is set at 0.20
Simply put - *saying there is no association, but there actually is
___________ = the ability of a study to detect an association, if one does exist
Power
Power = 1 - Type II (0.08)
***larger sample sizes have more power
Categorical data can be broken up into what two discrete categories?
Nominal (named, not ordered) - dichotomous test results (ex: horse vs donkey; or male vs female)
Ordinal (named and ordered, but no constant value between ranks) Ex: neonate vs juvenile vs adult vs geriatric
What is continuous data?
The variable is numeric and can have one of many possible values
Ex: BG, weight, etc
Describing categorical data by _______ _______ will summarize the number of animals in each category, counts on proportions, and the use of two-by-two tablets
Frequency distribution
What methods can be used to describe categorical date?
- frequency distribution
- Tablets or bar charts
- Statistical test like Chi-squared etc
What methods can be used to describe continuous date?
- frequency distribution and histogram
- Central tendency
- Dispersion - Statistical tests (95% CI, t-test, correlation, etc)
What information can be obtained from describing categorical data using Central tendency?
Describes the center of the distribution and measures central tendency (mean, median, and mode)
- mean = sum of all values/# of data points (very sensitive to extreme values)
- Median = the value which is in the center with half the data points above and half below
- mode = the most frequently occurring value or observation
What do you expect to see if there is a skewed distribution when analyzing categorical data with central tendency?
The mean and mode lines will not line up with the median
*symmetric distribution would have the mean and mode close the median (the same distance away)
What two values are used in measuring the dispersion of continuous data?
*this method describes how closely the values are gathered around the center of the distribution
Measures:
Range (the difference between minimum and maximum)
Standard deviation (the average distance between each measurement and the mean)
The chi-squared test is used to statistically evaluate what?
Difference in proportions
Used for categorical data
all two-by-two tables
The Student’s t-test is used to statistically evaluate what?
The difference in means
Compares the average of two groups
*used for continuous outcome data and categorical explanatory variable (independent variable)
What is correlation used for?
Statistical test that measures the strength and direction of a linear relationship between two continuous variables
Used for continuous data
What does the choice of statistical test depend on?
The nature of the explanatory and outcome variables
What statistical test is a test of independence between two categorical variables?
Chi squared
Used to answer: Does an association exist between the variables?
Used for two by two tables
What statistical test is used to compare the mean values of a continuous variable between two groups?
Student’s t test
Incorporates both the mean and the variance (dispersion) around the mean
*requires the value to be normally distributed in the population and similar variance in both groups
H0 = the means for the two groups are the same
What statistical test indicates the strength and direction of a linear relationship between two continuous variables?
Correlation coefficient (r)
often used for dose-response relationships
both variables are numerical, usually continuous
What value is considered a strong correlation value? And a weak correlation value?
Range: 0.0 < r < 1.0
STRONG = r is greater than 0.08
WEAK = r is less than 0.08