Biostats - Week 1 Flashcards

1
Q

Which kind of graph is negatively skewed?

A

Where bulk of data (curve is on the right) and the skewed data tails to the Left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

layman’s terms for precision v. accuracy. Give ex

A

Precision related to # of participants in your study. More participants = more precise.

Accuracy related to where you draw your sample from. Drawing from registered voters is considered to be an accurate measure of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

simplified way to think about what a chi test measures

A

how many people fall into one group or not (e.g. who got a cold after taking Vitamin A and who did not)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If a confidence interval range does NOT include 0 (e.g. 0.61-1.19cm), what does that tell you about the (two-sided) p-value for testing the null hypothesis?

A

so p value is the likelihood that your results were obtained by chance (as opposed to meaning something). So if 0 is outside the confidence interval, it is unlikely to be obtained by chance (outside that range) and thus, p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Using a lower - or more stringent - value of alpha does what

A

Makes it LESS likely to make a Type I error (helps prevent Type I errors). Idea is it’s harder to get a statistically significant result. Thus, you can be more confident of your findings IF they are statistically significant (p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can you never conclude from a p value

A

Can never conclude that there is a CLINICAL significance just because there is a statistical significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(4) data types. Which are categorical and which are numerical?

A

think “data NOIR”
Categorical = nominal and ordinal
-Nominal: UNordered categorical data
-Ordinal: ordered categorical data

Numerical = interval and ratio

  • Interval: similar intervals for numeric groups, but NO absolute zero
  • Ratio: similar intervals WITH an absolute zero, so can compute ratios
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal data, def and medical ex

A

unordered categories of data, i.e. no particular order or way of measuring these things; just different buckets to put stuff in
ex. smoking status, ethnicity, or specialty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What data type is dichotomous data?

A

Nominal data that only has 2 groups (buckets)

ex. diabetic v. non-diabetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ordinal data, def and medical ex

A
ordered (grouped) categorical data; so there is an order, but intervals between groups may be different. Means that computations on ordinal data are mathematically flawed.
ex. class rank and 5-point rating scale for faculty evals (b/c a rating of 4 isn't twice as better as 2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interval data, def and ex

A

data is ordered with meaningful intervals between the groups, but NO absolute zero exists
ex. graduation years (has no absolute zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ratio data, def and ex. How can ratio data be further broken down?

A

interval scale with an absolute zero, so you can compute ratios. Can be discrete (only has certain integer values) or continuous (can taken on any value)

ex. BP, weight, or age can taken on any value (continuous) but we generally reduce it to discrete data b/c we round it off
ex. of discrete would be # of patients seen in a day

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Addition rule, def and ex

A

the probability that A OR B will happen is the sum of individual probabilities of A and B. So two independent events that can NOT both happen.
ex. probability of surgery clerkship first = 16% and prob of IM first = 16%. Probability of getting IM OR surgery first = 32%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

multiplication rule, def and ex

A

probability of A AND B both occurring (must know the individual probabilities of both).
ex. prob of getting IM clerkship first = 16%. The probability of passing it is 95%.
Probability of getting IM first AND passing it = 0.16 x 0.95 = 15.2%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

precision v. accuracy (immunity from…)

A

precision = immunity from random variation. It’s related to the width of the confidence interval (sqrt of n)

accuracy = immunity from systematic error or bias (bias is something wrong with the way samples are chosen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

for a gaussian distribution, what is between +/-1 SD?

A

68% of your data lies in the range between +/- 1 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what % of the data lies below the +1 SD mark?

A

84% of the data. (50% below the mean + 34% between mean and +1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where does 99% of the data lie on a gaussian curve?

A

between +/-3 standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

z score, def and eqn

A

EACH data point on a “standard” Gaussian distribution has a z score, meaning that data point (x) is “z” standard deviations above or below the mean

z = (x - mean)/SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If looking at a Z table and see that z score of 1.10 = 0.8707. What does that mean?

A

means 87.07% of the data lies BELOW the point where z = 1.1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why are z scores symmetric?

A

because the gaussian curve is symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

(2) typical reasons for using z (t) scores

A
  1. To figure out how many SDs is your sample mean above or below the population mean
  2. Figure out how many SD away from the mean will contain a certain proportion of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the z score that divides the top 5% of a normal population from the remaining 95% not = +2?

A

picture gaussian curve. z = +2 has ~2.2% beyond it. So a z score LOWER than +2 will encompass all of the top 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why are t scores used more in practice than z scores?

A

Z scores are based on the ACTUAL standard error of the true population, which we don’t know.
But T scores use an ESTIMATED standard error of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Why does increasing the n# make t and z scores get closer to the same value? Around what n value are t and z about the same?

A

T scores are calculated by the degrees of freedom (n-1), which means that t scores change based on the population size (n). As n gets higher and higher, the d.f. goes up.
n > 100, t and z scores are about the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

mode

A

the measure (of central tendency) with the greatest frequency. Is the high point on the graph and is NOT influenced by extreme values (unlike mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When are mean, median, and mode (measures of central tendency) all the same?

A

normal (gaussian) distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

On a negatively skewed distribution, where do the mean, median, and mode measurements fall?

A

First, negatively skewed means the skewed data (tail) is to the left (heading towards negative x axis) and bulk is on R.
Mode = peak, Mean = closest to skewed tail, and Median is in between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

endemic v. epidemic

A

A disease in ENDEMIC when it is constantly present in a population or area. An endemic has a usual incidence/prevalence. Ex. Rhinovirus (common cold)

EPIDEMIC means more cases of that disease than expected in a population/location within a time frame. Diseases that start as epidemics may drift into endemicity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

epidemiology

A

study of the distribution and determinants of disease frequency. Disease does NOT occur randomly; there are causes and/or preventative factors for disease. Epidemiology is the study of those things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Preclinical v. Clinical phase of a disease

A

Preclinical begins with the onset of the disease and ends once signs/sx of the disease manifest.
Clinical phase begins with signs/sx and ends (ideally) with treatment/resolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

incubation period. What phase is this in?

A

time from colonization to the point where have sx. In the preclinical phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

(2) types of epidemiological studies and example

A

Experimental and Observational:

  • Experimental important in testing drugs
  • Observational are really important for learning causality. ex. figured out that Reye’s syndrome was caused by kids with viral infections taking ASA for fever
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Rate v. Proportion

A

Rate IS proportion per a specific time period.
Proportion = (# of cases)/(population at risk)
Rate = (# of cases)/(population at risk) IN A TIME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Incidence

A

[# of people who ACQUIRE the disease] divided by [# of people at risk] IN A TIME
(“associate in your mind the word ‘acquire’ with incidence”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Synonym for “attack rate”

A

Incidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

prevalence

A

(# of people that HAVE the disease)/(# of people at risk) …at a given point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What does prevalence not account for?

A

latent/undiagnosed diseases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Incidence rate v. Prevalence rate

A

Incidence rate = probability that healthy people will develop a particular disease DURING a specific period of time

Prevalence rate = proportion of people in a population who HAVE the disease AT a given time (point prevalent or period prevalence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

visual depiction of incidence, prevalence, mortality, and cure (slide 45)

A

prevalence is existing cup of liquid. Incidence is new cup pouring into prevalence.
Coming out at bottom of prevalence cup are mortalities and cures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

mortality rate

A

(# deaths)/(population)

Population is standardized to 10^n for a specific time interval. e.g. 10^3 = 1,000 or 10^5 = 10,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

neonatal v. infant mortality rate

A

Neonatal: (# deaths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

crude mortality rate v. cause-specific death rate

A

is simply the # of deaths/population (10^n) in specific time period
v. cause-specific death rate, which is (# of deaths due to certain cause)/population (10^n) in specific time period

44
Q

death to case ratio

A

(# of deaths attributed to a disease) / (# new cases identified)
ex. total of 300 cases of disease with 50 new cases, 20 of whom have died. Death to Case ratio = 20:50

45
Q

case fatality rate

A

(# cause specific deaths among the incident cases) / (# of incident cases). Can ONLY calculate the proportion of fatal cases once the epidemic ends.
ex. epidemic of a disease ends with 500 total cases, 250 of whom died
Case fatality rate = 250/500 = 50%

46
Q

crude birth rate v. crude fertility rate

A

crude birth = (# live births)/(population, 10^n)

crude fertility = (# live births)/(women aged 15-44 yrs)

47
Q

relationship of variance and standard deviation

A

variance = standard deviation squared

standard deviation = square root of the variance

48
Q

how is variance (and standard deviation) related to the accuracy or reliability of the data? For the population?

A

LESS variance (which means lower SD) means MORE accurate/reliable data because less variation means your data is more clustered and more accurate around the mean. Overall idea = results for sample more closely represent the true result in the population

49
Q

concept of standard deviation

A

in a normal distribution, the proportion of data elements is CONSTANT for a given number of standard deviations above or below the mean

50
Q

percentile (e.g. what does the 90th percentile represent?)

A

x percentile is the value below which x% of the data lie. e.g. 90% of the data lies below the 90th percentile

51
Q

what percentile is at the +1SD and +2SD marks of a normal distribution? How is +2SD percentile different from +/-2 SDs?

A

+1 SD = 84th percentile
+2 SD = 98th percentile
Even though 95% of the data LIES in between +/-2 SDs, this does not mean that +2SDs is the 95th percentile! It’s the 98th b/c of the little tail of the rest of the data after -2SD

52
Q

(4) examples of non-gaussian data distribution

A

skewed (positive or negative)
J-shaped (high frequency at R-most)
bimodal: two peaks of highest frequency
U-shaped: high frequency at both extremes

53
Q

Most important and practical way to increase the power of a statistical test

A

increase the sample size

54
Q

Probability v. Odds - getting heads on a coin toss

A

Probability of getting heads = 50%

The ODDS of getting heads = 50%/50% = 1
This is because Odds of an event happening = the probability it does happen / probability it doesn’t

55
Q

T/F - a cross-sectional study can be retrospective or prospective

A

False! Cross-sectional study collects data ONLY at one point in time; it is not retrospective or prospective

56
Q

The only type of study that can determine the absolute risk of contracting a disease

A

cohort study

57
Q

another name for case-control studies

A

retrospective studies. Case-control studies, by definition, look backward in time

58
Q

which type of study is the most powerful way to establish cause-and-effect relationships?

A

Controlled clinical trials are the only way to establish causation between exposure and illness. Cohort and case-control studies are only able to establish a statistical association, not actual causation

59
Q

Which type of study is the best method for evaluating rare illnesses?

A

case-control studies (b/c they identify the cases at the start of the trial)

60
Q

Which type of study is often very large, expensive, and spanning many years?

A

cohort studies

61
Q

selection bias

A

occur when systematic difference between either:

  • those participating in the study and those who do not, or
  • those in the tx arm of the study and those in control group
    ex. if study conducted at a hospital where pts with that disease are more likely to be referred, then that sample of pts probably doesn’t accurately represent the population
62
Q

(4) different ways to describe a Type I error

A

False positive
alpha error
incorrectly reject a TRUE null hypothesis
we think there’s an effect, when really there is NOT

63
Q

(4) different ways to describe a Type II error

A

Fasle negative
beta error
incorrect accept a FALSE null hypothesis
we think there is not an effect, when really there IS

64
Q

relationship of correlation, causation, and association

A

Correlation is a measure of the variables’ statistical ASSOCIATION, not of their causal relationship. Correlation does not equal causation.

65
Q

(2) different ways to define the null hypothesis

A
  1. no difference between the two groups

2. any observed differences are due to chance

66
Q

State alternative hypothesis for two-tailed v. one-tailed

A

Two-tailed: There is a difference between the two groups

One-tailed: the mean of the trial group is greater than the mean of the control group

67
Q

level of significance

A

the probability level at which it’s decided that the null hypothesis is INCORRECT is the significance level (alpha)

68
Q

(2) rules of Central Limit Theorem

A

If you plot the frequency distribution of the MEANS of infinite # of random samples, then:

  1. it will be a normal distribution, and
  2. the distribution mean - i.e. sample mean (mu x-bar) - will be the same as the population mean (mu)
69
Q

critical values, def and how to calculate

A

the +/- limits of the area of acceptance range (accept the null). Outside the critical value range = area of rejection (reject the null).

  • Must find critical values by looking at T score table. Based on degree of freedom (n - 1) and then look for value under (.05 for two-tailed).
    ex. +/-2.262 for df = 9
70
Q

estimated standard error of the mean, def and eqn

A

measures how much the sample mean deviates from the population mean

standard error = SD (x-bar) = SD/(sqrt of n)

71
Q

what does a t-score represent in hypothesis testing? T-score eqn?

A

the number of Estimated Standard Errors that the sample mean lies above or below the hypothesized population mean
Talc = [sample mean - hypothesized population mean] / est standard error of the sample

72
Q

t-test v. ANOVA

A

t-test compares the means between 2 groups

ANOVA: compares means of 3 or more different populations

73
Q

What data type does chi square test use?

A

Nominal data, since chi square is a test of proportions between groups (categorical)

74
Q

(3) ex of nonparametric tests

A

Spearman’s rank CORRELATION test
Wicoxon rank SUM test
Mann-Whitney test

75
Q

power analysis, customary power, and level of significance

A

aims to prevent type 2 error by ensuring adequate study size, involves:

  • fixing customary power to 80% and
  • fixing level of significance to 5%
76
Q

What does a non-inferiority design study? 1- or 2-tailed?

A

only wants to study (ensure) that the intervention is not worse than current standard of care.
Is a 1-tailed analysis that doesn’t need as many patients

77
Q

What kind of time-frame do case-control studies look at?

A

case-control studies are ALWAYS retrospective

78
Q

(3) advantages of case-control studies

A
  1. easy and inexpensive
  2. can study multiple risk factors
  3. since you identify patient cases at the beginning of the study, it’s the best way to study rare diseases
79
Q

(2) disadvantages of case-control studies

A
  1. highly prone to bias & confounding (especially recall and selection biases)
  2. hard to identify a truly matched population (e.g. similar in severity of illness, age-matched, etc)
80
Q

Name (2) examples that can NOT be studied via randomized trials? What must we rely on instead?

A

Surgery and pregnancy. ex. Can’t randomize people to get surgery or not.
For these, rely on observational studies

81
Q

definition of bias

A

systematic error in the study design that produces results “systematically” different from the truth

82
Q

(3) types of bias

A
  1. Selection (sampling) Bias: selection of pts doesn’t represent the population its supposed to represent (e.g. too old or too well-educated)
  2. Recall Bias: exists ANY time historical self-report info is collected from the respondents
  3. Measurement Bias: just means something wrong w/ way it’s being measured (instrument or observer)
83
Q

RRR v. ARR

A

RRR = Relative Risk Reduction - the RATIO of the risk rate in disease group / risk rate in control group. ex. 12%/20% = RRR of 0.60

ARR = Absolute Risk Reduction - % risk in control group - % risk in disease group. Since you just subtract, the ARR is always LESS than the RRR. Ex. 20% - 12% = ARR of 8%

84
Q

Calculate NNT

A

NNT = 100/ARR (if ARR is %) or 1/ARR (if ARR is in decimals)

ARR = % risk in control group - % risk in disease group

85
Q

observational v. experimental study and subcategories of each

A

these are the 2 major types of clinical studies.
Experimental mostly refers to randomized controlled trials
Observational (just watching) includes cohorts, case-control, cross-sectional, and case reports

86
Q

How to interpret the OR or RR

A

RR = 1: no difference
RR > 1: Increased risk
RR

87
Q

calculating odds

A

Odds = (probability of the event) / (probability of the NOT event)
or
(probability of the event) / (1 - probability of the event)

88
Q

difference in odds v. risk ratio

A

the denominator
Odds denominator = probability of the NOT event (or 1 - probability of the event), whereas
Risk denominator = sum total of risk factor + non-risk factor present

89
Q

Structure of the 7-character ICD-10 code

A

_ _ _ . _ _ _ _
Begins and ends with letters. First letter is the disease group (e.g. M = musculo sys, N = GU system). First 3 characters total represent category.
Next 3 = etiology, anatomic site, and severity respectively and the last one is an extension.

90
Q

What should patient-physician email not be used for, according to the AMA?

A

To establish a patient-physician relationship

91
Q

Which EHR received the highest ranking from users for its disease management features (in the survey of family physicians using EHRs)?

A

Praxis

92
Q

According to the O’Donnell article, among both physicians who use and do not use the copy and paste function (everyone), what was considered most problematic with copy and paste EHR function?

A

notes contain more inconsistent and more outdated information

93
Q

According to the O’Donnell article, what do the most physicians believe is best solution for problems w/ copy and paste function in EHR?

A

provide education for physicians regarding the copy/paste function use

94
Q

According to the Bryant article, what was observed about “alert fatigue”?

A

the number of alerts received did not correlate with physicians’ override rate

95
Q

According to the Bryant article, what was the rate of drug-drug alert overrides?

A

greater than 95% !!

96
Q

meta-analysis

A

summary study of previous trials to give us an overall result

97
Q

In screening for disease that has LOW prevalence, what will most positive tests be?

A

False positives

98
Q

What is the ‘implied promise’ of screening tests?

A

That the screening test IS beneficial and will do more good than harm

99
Q

When is the screening period?

A

Period between possible detection and occurrence of symptoms

100
Q

USPTF rating for screening mammography before age 50

A

C, should be an individual decision

101
Q

Age group that has a B (v. C) rating for PSA screening in men

A

B: men 55-69 yo
Under 40 men = C
40-54 YO at average risk = C

102
Q

USPSTF recommendation for testicular cancer screening in ASYMPTOMATIC patients

A

D (recommend against)

103
Q

USPSTF guidelines for mammography screening in 40-49, 50-74, and >75 YO women

A
40-49 = C
50-74 = B
>75 = I (Insufficient evidence)
104
Q

USPSTF guidelines for mammography screening in 40-49, 50-74, and >75 YO women

A
40-49 = C
50-74 = B
>75 = I (Insufficient evidence)
105
Q

USPSTF rating for lung cancer screening

A

B

106
Q

Name (4) D rating cancer screenings

A

Ovarian cancer
Pancreatic cancer
Prostate cancer
Testicular cancer

107
Q

Name (3) I-rated cancer screens

A

Bladder cancer
Oral cancer
Skin cancer prevention