Biostats - Week 1 Flashcards
(107 cards)
Which kind of graph is negatively skewed?
Where bulk of data (curve is on the right) and the skewed data tails to the Left
layman’s terms for precision v. accuracy. Give ex
Precision related to # of participants in your study. More participants = more precise.
Accuracy related to where you draw your sample from. Drawing from registered voters is considered to be an accurate measure of the population.
simplified way to think about what a chi test measures
how many people fall into one group or not (e.g. who got a cold after taking Vitamin A and who did not)
If a confidence interval range does NOT include 0 (e.g. 0.61-1.19cm), what does that tell you about the (two-sided) p-value for testing the null hypothesis?
so p value is the likelihood that your results were obtained by chance (as opposed to meaning something). So if 0 is outside the confidence interval, it is unlikely to be obtained by chance (outside that range) and thus, p
Using a lower - or more stringent - value of alpha does what
Makes it LESS likely to make a Type I error (helps prevent Type I errors). Idea is it’s harder to get a statistically significant result. Thus, you can be more confident of your findings IF they are statistically significant (p
What can you never conclude from a p value
Can never conclude that there is a CLINICAL significance just because there is a statistical significance.
(4) data types. Which are categorical and which are numerical?
think “data NOIR”
Categorical = nominal and ordinal
-Nominal: UNordered categorical data
-Ordinal: ordered categorical data
Numerical = interval and ratio
- Interval: similar intervals for numeric groups, but NO absolute zero
- Ratio: similar intervals WITH an absolute zero, so can compute ratios
Nominal data, def and medical ex
unordered categories of data, i.e. no particular order or way of measuring these things; just different buckets to put stuff in
ex. smoking status, ethnicity, or specialty
What data type is dichotomous data?
Nominal data that only has 2 groups (buckets)
ex. diabetic v. non-diabetic
Ordinal data, def and medical ex
ordered (grouped) categorical data; so there is an order, but intervals between groups may be different. Means that computations on ordinal data are mathematically flawed. ex. class rank and 5-point rating scale for faculty evals (b/c a rating of 4 isn't twice as better as 2)
Interval data, def and ex
data is ordered with meaningful intervals between the groups, but NO absolute zero exists
ex. graduation years (has no absolute zero)
Ratio data, def and ex. How can ratio data be further broken down?
interval scale with an absolute zero, so you can compute ratios. Can be discrete (only has certain integer values) or continuous (can taken on any value)
ex. BP, weight, or age can taken on any value (continuous) but we generally reduce it to discrete data b/c we round it off
ex. of discrete would be # of patients seen in a day
Addition rule, def and ex
the probability that A OR B will happen is the sum of individual probabilities of A and B. So two independent events that can NOT both happen.
ex. probability of surgery clerkship first = 16% and prob of IM first = 16%. Probability of getting IM OR surgery first = 32%
multiplication rule, def and ex
probability of A AND B both occurring (must know the individual probabilities of both).
ex. prob of getting IM clerkship first = 16%. The probability of passing it is 95%.
Probability of getting IM first AND passing it = 0.16 x 0.95 = 15.2%
precision v. accuracy (immunity from…)
precision = immunity from random variation. It’s related to the width of the confidence interval (sqrt of n)
accuracy = immunity from systematic error or bias (bias is something wrong with the way samples are chosen)
for a gaussian distribution, what is between +/-1 SD?
68% of your data lies in the range between +/- 1 SD
what % of the data lies below the +1 SD mark?
84% of the data. (50% below the mean + 34% between mean and +1)
Where does 99% of the data lie on a gaussian curve?
between +/-3 standard deviations
z score, def and eqn
EACH data point on a “standard” Gaussian distribution has a z score, meaning that data point (x) is “z” standard deviations above or below the mean
z = (x - mean)/SD
If looking at a Z table and see that z score of 1.10 = 0.8707. What does that mean?
means 87.07% of the data lies BELOW the point where z = 1.1
Why are z scores symmetric?
because the gaussian curve is symmetric
(2) typical reasons for using z (t) scores
- To figure out how many SDs is your sample mean above or below the population mean
- Figure out how many SD away from the mean will contain a certain proportion of the data
What is the z score that divides the top 5% of a normal population from the remaining 95% not = +2?
picture gaussian curve. z = +2 has ~2.2% beyond it. So a z score LOWER than +2 will encompass all of the top 5%
Why are t scores used more in practice than z scores?
Z scores are based on the ACTUAL standard error of the true population, which we don’t know.
But T scores use an ESTIMATED standard error of the mean