Intro to Biostats in Epi Flashcards
- Cite and describe the 3 attributes of study variables (data).
Order/magnitude
Consistency of scale / equal distances
Rational Absolute Zero
- Cite and describe the 3 levels/categories of data measurement
Nominal: dichotomous and non-ranked named categories
Ordinal: ordered, ranked categories
Interval: equal-distance numerical scales / units
Describe the difference between Ratio and Interval measurements of data. Give and example of each type of data measurement level
Ratio and Interval are nearly Identical, however Interval does NOT have a value that represents an “absolute zero”, while ratio data does
Ratio data example: “how much money do you earn per hour? (zero dollars = absolute zero)
Interval data example: “what is the temperature outside each day in December?” (zero degrees is NOT absolute zero bc it does not represent the absence of temperature)
All statistical tests are based off of the ____ _ ____of the data that is being compared
level of measurement
Compare and contrast the terms “discrete” and “continuous”as they relate to data measurement levels
discrete refers to Nominal and ordinal levels of data measurements
Continuous refers to the “equal distance in numerical scales” between categories in both interval and ratio levels of data measurement
You ____ go down in the specificity/detail of data measurement level, however you ____ go back up in specificity/detail.
Can
Cannot
researchers accept or dont accept the null hypothesis based on ____ _____.
statistical analysis
What data measurement level does a classic pain scale that is commonly used in healthcare settings, fall under?
Ordinal
The _______ level of data measurement is NEVER given in ranges, there will always be concrete numerical values
Interval/Ratio
True or False: Nominal data will not be given in categories because nominal data is always given in a dichotomous manner. Explain your answer
False
Nominal data can still be given in an unlimited number of categories, the categories simply cannot have any type of order or magnitude in relation to one another (hair color is a good example of this bc there is no magnitude amongst the categories)
Dichotomously recorded data is an indicator for Nominal data, however that does not mean that all nominal data MUST only have 2 categories
give the definitions of Mean, Median, and Mode. explain the effect that an outlier would have on their value.
Mean: the average of the data (outliers affect the mean)
Median: the calculated “middle” of the data range (outliers affect the median)
Mode: the most repeated number in the data range (outliers DO NOT affect this)
Describe what the IQR of a data set is
IQR = Interquartile Range
the middle 50% of the data values
25% on either side of the median
What 2 calculated values describe the dispersion/spread of a data set?
Variance and Standard Deviation
Define Variance and Standard Deviation
Variance: the average of the squared-differences in each individual measurement value and the group’s mean
Standard Deviation: (SD) 68%, 95%, and 99.7% are 1, 2, and 3 deviations of a data set respectively
State how to determine a positively or negatively skewed data set using graphs OR given values
Positively skewed: When the mean is greater than the median OR a graph with a tail pointing to the right
Negatively skewed: When the mean is less than the median OR a graph with a tail pointing to the left
True or False: a positive skewness value means that the data set is positively skewed. explain
False
the value being positive determines that there is a type of skew of the data set, however you would have to graph the data/interpret it’s mean and median values in order to determine if the skew is positive or negative
Compare and contrast skewness value and kurtosis
Skewness: a measure of the asymmetry of a distribution
A perfectly normal distribution (ideal bell curve) would have a skewness value of 0
Kurtosis: a measure of the extent to which observations cluster around the mean
A perfectly normal distribution (ideal bell curve) would have a kurtosis value of 0
Describe what positive and negative kurtosis values indicate
Positive Kurtosis: means there is a more dramatic clustering presentation of the data
Negative Kurtosis: means there is a less dramatic clustering presentation of the data
- List the percentages of a populations’ data comprised within 1, 2 and 3 standard deviations (SDs) around the mean of a normally distributed dataset.
1 SD = 68%
2 SD’s = 95%
3 SD’s = 99.7%
Stats test useful for normally distributed data are called _____ tests.
parametric
Most studies are set up to achieve what percentage of power?
80%
The following are all keywords for what type of data?
Pre vs post ; before vs after ; baseline vs end
paired or related data
What is the function of “survival” tests? list the tests that are considered survival tests
Survival tests compare the proportion of events ever time or time-to events between groups?
(includes the log-rank, cox-proportional hazards, and Kaplan-meier tests)
What is the function of regression tests? list the tests that are considered to be be regression tests
regressions provide a measure of the relationship between variables by allowing the prediction about the dependent variable (DV) using the known values of independent variables
you can also calculate an OR from regression tests
(includes Logistic regressions, multinominal logistic regressions, and linear regression tests)
Compare null and alternative hypotheses
Null hypothesis: a research perspective that states that there will be NO true difference between the groups being compared
Alternative hypothesis: a research perspective that states that there WILL be a true difference between the groups being compared
define p value and how you decide if it is statistically significant or not
The p value describes the likelihood of committing a type I error if the null hypothesis is rejected
If a p value is lower than the preselected alpha value (almost always 0.05) then it is considered to be statistically significant
Define type 1 errors and give an alternative name for it.
Type 1 error: rejecting the null hypothesis when you should have accepted it (there really is no true differences between the groups, but you incorrectly state that there is a difference by rejecting the null hypothesis)
alpha error ; false positive
Define type 2 errors and give an alternative name for it.
Type 2 error: accepting the null hypothesis when you should have rejected it (there really IS a true difference between the groups, but you incorrectly state that there is not a true difference between the groups by accepting the null hypothesis)
beta error ; false negative
- Delineate the common elements utilized in determining sample size of a study.
The minimum difference between groups deemed significant (the smaller the difference between the groups that is deemed to be significant, the greater the sample size needed)
Expected variation of measurement (known/estimated from past studies)
Type 1 and Type 2 error rates and confidence interval (usually ranges from 90% to 95%
Add in anticipated drop-outs or loss to follow up
- Describe how sample size affects power and the ability to detect a difference between populations, if a difference truly exists.
The level of power that study has is directly proportional to the sample size (the larger the sample size, the greater the ability that study will have to detect a difference IF there is in fact one present)
- Define the differences between parametric and non-parametric statistical tests, listed below, and cite which tests fit under each of these two categories.
Parametric statistical tests are effectively applied to normally-distributed data sets that are “normally distributed” and have “equal variances”
(INTERVAL)
Non-parametric statistical tests do not require that the data be normally distributed
The data can also be transformed to a standardized value (z-score or log transformation) in an attempt to present that data in a more normally-distributed manner
(NOMINAL AND ORDINAL)
- Define the difference between independent and paired (repeated measures) data measurements and comparisons.
Independent Data is collected from different groups ; no group has the same data measurement conducted more than once
Paired (related) data is a data measurement that is collected from the same group more than once ; the same group has the same data measurement taken more than once
The same data measurement may be conducted more than once, however if it is being measured on another group, it is considered independent data still.
- Cite the statistic performed to assess consistency and agreement, within and between individual investigators/evaluators. interpret the values of +1, 0, and -1 for this statistic.
The Kappa statistic is a correlation test that shows the relationship or agreement between evaluators
Kappa Interpretation
+1 means the observers PERFECTLY classify everyone the exact same way
0 means there is no relationship at all between the observer’s classifications, therefore the differences exhibited between observers is left completely up to chance
-1 means that the observers classify everyone exactly the OPPOSITE of each other
When you see the phrase “mean length of time” what level of data measurement should you be thinking?
interval
When the levene’s test yields a p value of _____, what do you need to do? (fill in the blank and answer the question)
less than 0.05
go down a level of data measurement
When you see the phrase “time-to-event” what level of data measurement should you be thinking?
Interval (bc it is a numerical value that measures time ex. days, hours, minutes, etc)
If you are working with interval data, and the data is ____ evenly distributed, what must you do? (fill in the blank and answer the question)
NOT
you must “step down” a level of data measurement to ordinal
The buzzword “between” points to _____ and the buzzword “within” points to _____.
independent data
related data
Define power
the statistical ability of a test to detect a difference between groups IF there is one present
What is the purpose of the levene’s test?
to decide if the data is normally distributed and has equal variances (or not)
If you are trying to calculate the exact range for 1 standard deviation, how exactly would you calculate it if you are provided the mean, median, and SD for the data set?
for 1 SD: add and subtract the SD value from the mean in order to find the upper and lower ranges
(for 2 or 3 SD’s, simply subtract and add the SD the appropriate number of times)
a p value is determined based on the probability of observing, ___ _ ____ ____, a test statistic value as extreme or more extreme than actually observed if groups were similar.
Due to chance alone
Confidence Intervals are calculated based on what 2 factors?
Variance in the sample (V/SD)
Sample Size (N)
State the numbers that a CI must “cross” in order to be deemed non significant, for the following data types
Ratios (OR,RR,HR):
Absolute differences:
Ratios (OR,RR,HR): 1.0
Absolute differences: 0.0
State the 3 crucial pieces of information that need to be included when correctly interpreting a CI value.
Which group is being compared to whom
Direction of the ratio (greater or less than the comparison group) and its Magnitude
State whether or not it is statistically significant