Biostats - Week 1 Flashcards

(107 cards)

1
Q

Which kind of graph is negatively skewed?

A

Where bulk of data (curve is on the right) and the skewed data tails to the Left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

layman’s terms for precision v. accuracy. Give ex

A

Precision related to # of participants in your study. More participants = more precise.

Accuracy related to where you draw your sample from. Drawing from registered voters is considered to be an accurate measure of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

simplified way to think about what a chi test measures

A

how many people fall into one group or not (e.g. who got a cold after taking Vitamin A and who did not)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If a confidence interval range does NOT include 0 (e.g. 0.61-1.19cm), what does that tell you about the (two-sided) p-value for testing the null hypothesis?

A

so p value is the likelihood that your results were obtained by chance (as opposed to meaning something). So if 0 is outside the confidence interval, it is unlikely to be obtained by chance (outside that range) and thus, p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Using a lower - or more stringent - value of alpha does what

A

Makes it LESS likely to make a Type I error (helps prevent Type I errors). Idea is it’s harder to get a statistically significant result. Thus, you can be more confident of your findings IF they are statistically significant (p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can you never conclude from a p value

A

Can never conclude that there is a CLINICAL significance just because there is a statistical significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(4) data types. Which are categorical and which are numerical?

A

think “data NOIR”
Categorical = nominal and ordinal
-Nominal: UNordered categorical data
-Ordinal: ordered categorical data

Numerical = interval and ratio

  • Interval: similar intervals for numeric groups, but NO absolute zero
  • Ratio: similar intervals WITH an absolute zero, so can compute ratios
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal data, def and medical ex

A

unordered categories of data, i.e. no particular order or way of measuring these things; just different buckets to put stuff in
ex. smoking status, ethnicity, or specialty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What data type is dichotomous data?

A

Nominal data that only has 2 groups (buckets)

ex. diabetic v. non-diabetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ordinal data, def and medical ex

A
ordered (grouped) categorical data; so there is an order, but intervals between groups may be different. Means that computations on ordinal data are mathematically flawed.
ex. class rank and 5-point rating scale for faculty evals (b/c a rating of 4 isn't twice as better as 2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interval data, def and ex

A

data is ordered with meaningful intervals between the groups, but NO absolute zero exists
ex. graduation years (has no absolute zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ratio data, def and ex. How can ratio data be further broken down?

A

interval scale with an absolute zero, so you can compute ratios. Can be discrete (only has certain integer values) or continuous (can taken on any value)

ex. BP, weight, or age can taken on any value (continuous) but we generally reduce it to discrete data b/c we round it off
ex. of discrete would be # of patients seen in a day

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Addition rule, def and ex

A

the probability that A OR B will happen is the sum of individual probabilities of A and B. So two independent events that can NOT both happen.
ex. probability of surgery clerkship first = 16% and prob of IM first = 16%. Probability of getting IM OR surgery first = 32%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

multiplication rule, def and ex

A

probability of A AND B both occurring (must know the individual probabilities of both).
ex. prob of getting IM clerkship first = 16%. The probability of passing it is 95%.
Probability of getting IM first AND passing it = 0.16 x 0.95 = 15.2%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

precision v. accuracy (immunity from…)

A

precision = immunity from random variation. It’s related to the width of the confidence interval (sqrt of n)

accuracy = immunity from systematic error or bias (bias is something wrong with the way samples are chosen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

for a gaussian distribution, what is between +/-1 SD?

A

68% of your data lies in the range between +/- 1 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what % of the data lies below the +1 SD mark?

A

84% of the data. (50% below the mean + 34% between mean and +1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where does 99% of the data lie on a gaussian curve?

A

between +/-3 standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

z score, def and eqn

A

EACH data point on a “standard” Gaussian distribution has a z score, meaning that data point (x) is “z” standard deviations above or below the mean

z = (x - mean)/SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If looking at a Z table and see that z score of 1.10 = 0.8707. What does that mean?

A

means 87.07% of the data lies BELOW the point where z = 1.1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why are z scores symmetric?

A

because the gaussian curve is symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

(2) typical reasons for using z (t) scores

A
  1. To figure out how many SDs is your sample mean above or below the population mean
  2. Figure out how many SD away from the mean will contain a certain proportion of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the z score that divides the top 5% of a normal population from the remaining 95% not = +2?

A

picture gaussian curve. z = +2 has ~2.2% beyond it. So a z score LOWER than +2 will encompass all of the top 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why are t scores used more in practice than z scores?

A

Z scores are based on the ACTUAL standard error of the true population, which we don’t know.
But T scores use an ESTIMATED standard error of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why does increasing the n# make t and z scores get closer to the same value? Around what n value are t and z about the same?
T scores are calculated by the degrees of freedom (n-1), which means that t scores change based on the population size (n). As n gets higher and higher, the d.f. goes up. n > 100, t and z scores are about the same
26
mode
the measure (of central tendency) with the greatest frequency. Is the high point on the graph and is NOT influenced by extreme values (unlike mean)
27
When are mean, median, and mode (measures of central tendency) all the same?
normal (gaussian) distribution
28
On a negatively skewed distribution, where do the mean, median, and mode measurements fall?
First, negatively skewed means the skewed data (tail) is to the left (heading towards negative x axis) and bulk is on R. Mode = peak, Mean = closest to skewed tail, and Median is in between the two
29
endemic v. epidemic
A disease in ENDEMIC when it is constantly present in a population or area. An endemic has a usual incidence/prevalence. Ex. Rhinovirus (common cold) EPIDEMIC means more cases of that disease than expected in a population/location within a time frame. Diseases that start as epidemics may drift into endemicity.
30
epidemiology
study of the distribution and determinants of disease frequency. Disease does NOT occur randomly; there are causes and/or preventative factors for disease. Epidemiology is the study of those things
31
Preclinical v. Clinical phase of a disease
Preclinical begins with the onset of the disease and ends once signs/sx of the disease manifest. Clinical phase begins with signs/sx and ends (ideally) with treatment/resolution
32
incubation period. What phase is this in?
time from colonization to the point where have sx. In the preclinical phase
33
(2) types of epidemiological studies and example
Experimental and Observational: - Experimental important in testing drugs - Observational are really important for learning causality. ex. figured out that Reye's syndrome was caused by kids with viral infections taking ASA for fever
34
Rate v. Proportion
Rate IS proportion per a specific time period. Proportion = (# of cases)/(population at risk) Rate = (# of cases)/(population at risk) IN A TIME
35
Incidence
[# of people who ACQUIRE the disease] divided by [# of people at risk] IN A TIME ("associate in your mind the word 'acquire' with incidence")
36
Synonym for "attack rate"
Incidence
37
prevalence
(# of people that HAVE the disease)/(# of people at risk) ...at a given point in time
38
What does prevalence not account for?
latent/undiagnosed diseases
39
Incidence rate v. Prevalence rate
Incidence rate = probability that healthy people will develop a particular disease DURING a specific period of time Prevalence rate = proportion of people in a population who HAVE the disease AT a given time (point prevalent or period prevalence)
40
visual depiction of incidence, prevalence, mortality, and cure (slide 45)
prevalence is existing cup of liquid. Incidence is new cup pouring into prevalence. Coming out at bottom of prevalence cup are mortalities and cures
41
mortality rate
(# deaths)/(population) | Population is standardized to 10^n for a specific time interval. e.g. 10^3 = 1,000 or 10^5 = 10,000
42
neonatal v. infant mortality rate
Neonatal: (# deaths
43
crude mortality rate v. cause-specific death rate
is simply the # of deaths/population (10^n) in specific time period v. cause-specific death rate, which is (# of deaths due to certain cause)/population (10^n) in specific time period
44
death to case ratio
(# of deaths attributed to a disease) / (# new cases identified) ex. total of 300 cases of disease with 50 new cases, 20 of whom have died. Death to Case ratio = 20:50
45
case fatality rate
(# cause specific deaths among the incident cases) / (# of incident cases). Can ONLY calculate the proportion of fatal cases once the epidemic ends. ex. epidemic of a disease ends with 500 total cases, 250 of whom died Case fatality rate = 250/500 = 50%
46
crude birth rate v. crude fertility rate
crude birth = (# live births)/(population, 10^n) crude fertility = (# live births)/(women aged 15-44 yrs)
47
relationship of variance and standard deviation
variance = standard deviation squared standard deviation = square root of the variance
48
how is variance (and standard deviation) related to the accuracy or reliability of the data? For the population?
LESS variance (which means lower SD) means MORE accurate/reliable data because less variation means your data is more clustered and more accurate around the mean. Overall idea = results for sample more closely represent the true result in the population
49
concept of standard deviation
in a normal distribution, the proportion of data elements is CONSTANT for a given number of standard deviations above or below the mean
50
percentile (e.g. what does the 90th percentile represent?)
x percentile is the value below which x% of the data lie. e.g. 90% of the data lies below the 90th percentile
51
what percentile is at the +1SD and +2SD marks of a normal distribution? How is +2SD percentile different from +/-2 SDs?
+1 SD = 84th percentile +2 SD = 98th percentile Even though 95% of the data LIES in between +/-2 SDs, this does not mean that +2SDs is the 95th percentile! It's the 98th b/c of the little tail of the rest of the data after -2SD
52
(4) examples of non-gaussian data distribution
skewed (positive or negative) J-shaped (high frequency at R-most) bimodal: two peaks of highest frequency U-shaped: high frequency at both extremes
53
Most important and practical way to increase the power of a statistical test
increase the sample size
54
Probability v. Odds - getting heads on a coin toss
Probability of getting heads = 50% The ODDS of getting heads = 50%/50% = 1 This is because Odds of an event happening = the probability it does happen / probability it doesn't
55
T/F - a cross-sectional study can be retrospective or prospective
False! Cross-sectional study collects data ONLY at one point in time; it is not retrospective or prospective
56
The only type of study that can determine the absolute risk of contracting a disease
cohort study
57
another name for case-control studies
retrospective studies. Case-control studies, by definition, look backward in time
58
which type of study is the most powerful way to establish cause-and-effect relationships?
Controlled clinical trials are the only way to establish causation between exposure and illness. Cohort and case-control studies are only able to establish a statistical association, not actual causation
59
Which type of study is the best method for evaluating rare illnesses?
case-control studies (b/c they identify the cases at the start of the trial)
60
Which type of study is often very large, expensive, and spanning many years?
cohort studies
61
selection bias
occur when systematic difference between either: - those participating in the study and those who do not, or - those in the tx arm of the study and those in control group ex. if study conducted at a hospital where pts with that disease are more likely to be referred, then that sample of pts probably doesn't accurately represent the population
62
(4) different ways to describe a Type I error
False positive alpha error incorrectly reject a TRUE null hypothesis we think there's an effect, when really there is NOT
63
(4) different ways to describe a Type II error
Fasle negative beta error incorrect accept a FALSE null hypothesis we think there is not an effect, when really there IS
64
relationship of correlation, causation, and association
Correlation is a measure of the variables' statistical ASSOCIATION, not of their causal relationship. Correlation does not equal causation.
65
(2) different ways to define the null hypothesis
1. no difference between the two groups | 2. any observed differences are due to chance
66
State alternative hypothesis for two-tailed v. one-tailed
Two-tailed: There is a difference between the two groups One-tailed: the mean of the trial group is greater than the mean of the control group
67
level of significance
the probability level at which it's decided that the null hypothesis is INCORRECT is the significance level (alpha)
68
(2) rules of Central Limit Theorem
If you plot the frequency distribution of the MEANS of infinite # of random samples, then: 1. it will be a normal distribution, and 2. the distribution mean - i.e. sample mean (mu x-bar) - will be the same as the population mean (mu)
69
critical values, def and how to calculate
the +/- limits of the area of acceptance range (accept the null). Outside the critical value range = area of rejection (reject the null). * Must find critical values by looking at T score table. Based on degree of freedom (n - 1) and then look for value under (.05 for two-tailed). ex. +/-2.262 for df = 9
70
estimated standard error of the mean, def and eqn
measures how much the sample mean deviates from the population mean standard error = SD (x-bar) = SD/(sqrt of n)
71
what does a t-score represent in hypothesis testing? T-score eqn?
the number of Estimated Standard Errors that the sample mean lies above or below the hypothesized population mean Talc = [sample mean - hypothesized population mean] / est standard error of the sample
72
t-test v. ANOVA
t-test compares the means between 2 groups ANOVA: compares means of 3 or more different populations
73
What data type does chi square test use?
Nominal data, since chi square is a test of proportions between groups (categorical)
74
(3) ex of nonparametric tests
Spearman's rank CORRELATION test Wicoxon rank SUM test Mann-Whitney test
75
power analysis, customary power, and level of significance
aims to prevent type 2 error by ensuring adequate study size, involves: - fixing customary power to 80% and - fixing level of significance to 5%
76
What does a non-inferiority design study? 1- or 2-tailed?
only wants to study (ensure) that the intervention is not worse than current standard of care. Is a 1-tailed analysis that doesn't need as many patients
77
What kind of time-frame do case-control studies look at?
case-control studies are ALWAYS retrospective
78
(3) advantages of case-control studies
1. easy and inexpensive 2. can study multiple risk factors 3. since you identify patient cases at the beginning of the study, it's the best way to study rare diseases
79
(2) disadvantages of case-control studies
1. highly prone to bias & confounding (especially recall and selection biases) 2. hard to identify a truly matched population (e.g. similar in severity of illness, age-matched, etc)
80
Name (2) examples that can NOT be studied via randomized trials? What must we rely on instead?
Surgery and pregnancy. ex. Can't randomize people to get surgery or not. For these, rely on observational studies
81
definition of bias
systematic error in the study design that produces results "systematically" different from the truth
82
(3) types of bias
1. Selection (sampling) Bias: selection of pts doesn't represent the population its supposed to represent (e.g. too old or too well-educated) 2. Recall Bias: exists ANY time historical self-report info is collected from the respondents 3. Measurement Bias: just means something wrong w/ way it's being measured (instrument or observer)
83
RRR v. ARR
RRR = Relative Risk Reduction - the RATIO of the risk rate in disease group / risk rate in control group. ex. 12%/20% = RRR of 0.60 ARR = Absolute Risk Reduction - % risk in control group - % risk in disease group. Since you just subtract, the ARR is always LESS than the RRR. Ex. 20% - 12% = ARR of 8%
84
Calculate NNT
NNT = 100/ARR (if ARR is %) or 1/ARR (if ARR is in decimals) ARR = % risk in control group - % risk in disease group
85
observational v. experimental study and subcategories of each
these are the 2 major types of clinical studies. Experimental mostly refers to randomized controlled trials Observational (just watching) includes cohorts, case-control, cross-sectional, and case reports
86
How to interpret the OR or RR
RR = 1: no difference RR > 1: Increased risk RR
87
calculating odds
Odds = (probability of the event) / (probability of the NOT event) or (probability of the event) / (1 - probability of the event)
88
difference in odds v. risk ratio
the denominator Odds denominator = probability of the NOT event (or 1 - probability of the event), whereas Risk denominator = sum total of risk factor + non-risk factor present
89
Structure of the 7-character ICD-10 code
_ _ _ . _ _ _ _ Begins and ends with letters. First letter is the disease group (e.g. M = musculo sys, N = GU system). First 3 characters total represent category. Next 3 = etiology, anatomic site, and severity respectively and the last one is an extension.
90
What should patient-physician email not be used for, according to the AMA?
To establish a patient-physician relationship
91
Which EHR received the highest ranking from users for its disease management features (in the survey of family physicians using EHRs)?
Praxis
92
According to the O'Donnell article, among both physicians who use and do not use the copy and paste function (everyone), what was considered most problematic with copy and paste EHR function?
notes contain more inconsistent and more outdated information
93
According to the O'Donnell article, what do the most physicians believe is best solution for problems w/ copy and paste function in EHR?
provide education for physicians regarding the copy/paste function use
94
According to the Bryant article, what was observed about "alert fatigue"?
the number of alerts received did not correlate with physicians' override rate
95
According to the Bryant article, what was the rate of drug-drug alert overrides?
greater than 95% !!
96
meta-analysis
summary study of previous trials to give us an overall result
97
In screening for disease that has LOW prevalence, what will most positive tests be?
False positives
98
What is the 'implied promise' of screening tests?
That the screening test IS beneficial and will do more good than harm
99
When is the screening period?
Period between possible detection and occurrence of symptoms
100
USPTF rating for screening mammography before age 50
C, should be an individual decision
101
Age group that has a B (v. C) rating for PSA screening in men
B: men 55-69 yo Under 40 men = C 40-54 YO at average risk = C
102
USPSTF recommendation for testicular cancer screening in ASYMPTOMATIC patients
D (recommend against)
103
USPSTF guidelines for mammography screening in 40-49, 50-74, and >75 YO women
``` 40-49 = C 50-74 = B >75 = I (Insufficient evidence) ```
104
USPSTF guidelines for mammography screening in 40-49, 50-74, and >75 YO women
``` 40-49 = C 50-74 = B >75 = I (Insufficient evidence) ```
105
USPSTF rating for lung cancer screening
B
106
Name (4) D rating cancer screenings
Ovarian cancer Pancreatic cancer Prostate cancer Testicular cancer
107
Name (3) I-rated cancer screens
Bladder cancer Oral cancer Skin cancer prevention