Biostats - Week 1 Flashcards
Which kind of graph is negatively skewed?
Where bulk of data (curve is on the right) and the skewed data tails to the Left
layman’s terms for precision v. accuracy. Give ex
Precision related to # of participants in your study. More participants = more precise.
Accuracy related to where you draw your sample from. Drawing from registered voters is considered to be an accurate measure of the population.
simplified way to think about what a chi test measures
how many people fall into one group or not (e.g. who got a cold after taking Vitamin A and who did not)
If a confidence interval range does NOT include 0 (e.g. 0.61-1.19cm), what does that tell you about the (two-sided) p-value for testing the null hypothesis?
so p value is the likelihood that your results were obtained by chance (as opposed to meaning something). So if 0 is outside the confidence interval, it is unlikely to be obtained by chance (outside that range) and thus, p
Using a lower - or more stringent - value of alpha does what
Makes it LESS likely to make a Type I error (helps prevent Type I errors). Idea is it’s harder to get a statistically significant result. Thus, you can be more confident of your findings IF they are statistically significant (p
What can you never conclude from a p value
Can never conclude that there is a CLINICAL significance just because there is a statistical significance.
(4) data types. Which are categorical and which are numerical?
think “data NOIR”
Categorical = nominal and ordinal
-Nominal: UNordered categorical data
-Ordinal: ordered categorical data
Numerical = interval and ratio
- Interval: similar intervals for numeric groups, but NO absolute zero
- Ratio: similar intervals WITH an absolute zero, so can compute ratios
Nominal data, def and medical ex
unordered categories of data, i.e. no particular order or way of measuring these things; just different buckets to put stuff in
ex. smoking status, ethnicity, or specialty
What data type is dichotomous data?
Nominal data that only has 2 groups (buckets)
ex. diabetic v. non-diabetic
Ordinal data, def and medical ex
ordered (grouped) categorical data; so there is an order, but intervals between groups may be different. Means that computations on ordinal data are mathematically flawed. ex. class rank and 5-point rating scale for faculty evals (b/c a rating of 4 isn't twice as better as 2)
Interval data, def and ex
data is ordered with meaningful intervals between the groups, but NO absolute zero exists
ex. graduation years (has no absolute zero)
Ratio data, def and ex. How can ratio data be further broken down?
interval scale with an absolute zero, so you can compute ratios. Can be discrete (only has certain integer values) or continuous (can taken on any value)
ex. BP, weight, or age can taken on any value (continuous) but we generally reduce it to discrete data b/c we round it off
ex. of discrete would be # of patients seen in a day
Addition rule, def and ex
the probability that A OR B will happen is the sum of individual probabilities of A and B. So two independent events that can NOT both happen.
ex. probability of surgery clerkship first = 16% and prob of IM first = 16%. Probability of getting IM OR surgery first = 32%
multiplication rule, def and ex
probability of A AND B both occurring (must know the individual probabilities of both).
ex. prob of getting IM clerkship first = 16%. The probability of passing it is 95%.
Probability of getting IM first AND passing it = 0.16 x 0.95 = 15.2%
precision v. accuracy (immunity from…)
precision = immunity from random variation. It’s related to the width of the confidence interval (sqrt of n)
accuracy = immunity from systematic error or bias (bias is something wrong with the way samples are chosen)
for a gaussian distribution, what is between +/-1 SD?
68% of your data lies in the range between +/- 1 SD
what % of the data lies below the +1 SD mark?
84% of the data. (50% below the mean + 34% between mean and +1)
Where does 99% of the data lie on a gaussian curve?
between +/-3 standard deviations
z score, def and eqn
EACH data point on a “standard” Gaussian distribution has a z score, meaning that data point (x) is “z” standard deviations above or below the mean
z = (x - mean)/SD
If looking at a Z table and see that z score of 1.10 = 0.8707. What does that mean?
means 87.07% of the data lies BELOW the point where z = 1.1
Why are z scores symmetric?
because the gaussian curve is symmetric
(2) typical reasons for using z (t) scores
- To figure out how many SDs is your sample mean above or below the population mean
- Figure out how many SD away from the mean will contain a certain proportion of the data
What is the z score that divides the top 5% of a normal population from the remaining 95% not = +2?
picture gaussian curve. z = +2 has ~2.2% beyond it. So a z score LOWER than +2 will encompass all of the top 5%
Why are t scores used more in practice than z scores?
Z scores are based on the ACTUAL standard error of the true population, which we don’t know.
But T scores use an ESTIMATED standard error of the mean
Why does increasing the n# make t and z scores get closer to the same value? Around what n value are t and z about the same?
T scores are calculated by the degrees of freedom (n-1), which means that t scores change based on the population size (n). As n gets higher and higher, the d.f. goes up.
n > 100, t and z scores are about the same
mode
the measure (of central tendency) with the greatest frequency. Is the high point on the graph and is NOT influenced by extreme values (unlike mean)
When are mean, median, and mode (measures of central tendency) all the same?
normal (gaussian) distribution
On a negatively skewed distribution, where do the mean, median, and mode measurements fall?
First, negatively skewed means the skewed data (tail) is to the left (heading towards negative x axis) and bulk is on R.
Mode = peak, Mean = closest to skewed tail, and Median is in between the two
endemic v. epidemic
A disease in ENDEMIC when it is constantly present in a population or area. An endemic has a usual incidence/prevalence. Ex. Rhinovirus (common cold)
EPIDEMIC means more cases of that disease than expected in a population/location within a time frame. Diseases that start as epidemics may drift into endemicity.
epidemiology
study of the distribution and determinants of disease frequency. Disease does NOT occur randomly; there are causes and/or preventative factors for disease. Epidemiology is the study of those things
Preclinical v. Clinical phase of a disease
Preclinical begins with the onset of the disease and ends once signs/sx of the disease manifest.
Clinical phase begins with signs/sx and ends (ideally) with treatment/resolution
incubation period. What phase is this in?
time from colonization to the point where have sx. In the preclinical phase
(2) types of epidemiological studies and example
Experimental and Observational:
- Experimental important in testing drugs
- Observational are really important for learning causality. ex. figured out that Reye’s syndrome was caused by kids with viral infections taking ASA for fever
Rate v. Proportion
Rate IS proportion per a specific time period.
Proportion = (# of cases)/(population at risk)
Rate = (# of cases)/(population at risk) IN A TIME
Incidence
[# of people who ACQUIRE the disease] divided by [# of people at risk] IN A TIME
(“associate in your mind the word ‘acquire’ with incidence”)
Synonym for “attack rate”
Incidence
prevalence
(# of people that HAVE the disease)/(# of people at risk) …at a given point in time
What does prevalence not account for?
latent/undiagnosed diseases
Incidence rate v. Prevalence rate
Incidence rate = probability that healthy people will develop a particular disease DURING a specific period of time
Prevalence rate = proportion of people in a population who HAVE the disease AT a given time (point prevalent or period prevalence)
visual depiction of incidence, prevalence, mortality, and cure (slide 45)
prevalence is existing cup of liquid. Incidence is new cup pouring into prevalence.
Coming out at bottom of prevalence cup are mortalities and cures
mortality rate
(# deaths)/(population)
Population is standardized to 10^n for a specific time interval. e.g. 10^3 = 1,000 or 10^5 = 10,000
neonatal v. infant mortality rate
Neonatal: (# deaths