Biostatistics Flashcards

1
Q

Descriptive statistics

A

the collection, organization, summarization, and analysis of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential staitistics

A

drawing inferences about a body of data when only a part of the data is the observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

population

A

defined by a sphere of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample

A

subgroup or subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

parameter

A

characteristics or measure obtained from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

statistic

A

characteristics or measure obtained from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We compute _____ and use them to estimate _____.

A

We compute statistics and use them to estimate parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

nominal scale

A

The lowest measurement scale.

Used for naming or labeling, not ordering.

Though numbers can be used, the relationship between the numbers are not meaningful.

Ex: Categorical and Dichotomous variables (Marital status, DL #, SSN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ordinal scale

A

observations are ranked; level of differences between ranks is unknown

Ex: Low, Medium, High; Likert-type scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

interval scale

A

observations are ranked; level of differences between ranks is equal; scale is relative

No true zero point, so ratios are meaningless.

Ex: Temperature (F/C) or pH scales (0 does not equal absence of heat/acidity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ratio scale

A

observations are ranked; level of differences between ranks is equal;

true zero point exist

Ex: height, length, Kelvin Temperature scale (defines 0K as absolute zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures of disease frequency

A

count, ratio, proportion, rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

count

A

of cases of a disease or other health condition;

Ex: dorm students with COVID-19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

proportion

A

measure that states a count relative to the size of the group;

numerator/denominator

Ex: dorm students with COVID-19/all student

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ratio

A

divide one number into another number

numerator does not have be a subset of denominator

Ex: dorm students with COVID-19/dorm students with flu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

rate

A

similar to ratios and proportions, but includes a time components

Ex: % of dorm students with COVID-19 in 2020

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Descriptive Study Examples

A
  • case studies/reports
  • cross-sectional studies
  • ecological studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Analytical Study Examples

A
  • Case-control Studies
  • cohort studies
  • randomized control studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cohort Study

A

begin with a group of people who are disease free at baseline

Follow over time and classify on exposure; identify incident cases

MOA: Relative risk

Good for prevalent diseases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Case-Control Study

A

Compare Diseased (cases) to Disease free (controls)

Classify on disease status; collect exposure data retrospectively

MOA: Odds ratio

Good for rare disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

RR or OR = 1

A

no association between exposure and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

RR or OR > 1

A

exposure increases risk of the outcome

Positive (direct) association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

RR or OR < 1

A

exposure decreases risk of the outcome

Negative (inverse) association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

RR range

A

-1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When interpreting OR, begin with the _____

A

outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When interpreting RR, begin with the _____

A

exposure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Attributable risk

A

tells us how much of the disease that occurs can be attributed to a certain exposure

calculate among exposed individuals or an entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

background risk

A

the risk of non-exposed people is not zero

Ex: some people who get lung cancer do not smoke

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Attributable risk formula

A

(incidence in exposed) - (incidence in unexposed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

simple random sample

A

enumerate all members of the population N

select n individuals at random (each has the same probability of being selected)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

systematic sampling

A
  1. start with sampling frame
  2. determine sampling interval (N/n)
  3. select first person at random from first (N/n) and every (N/n) thereafter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Stratified sampling

A

organize population into mutually exclusive strata, select individuals at random within each stratum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

binomial distribution

A
  • models # of events out of n observations
  • 2 possible outcomes: success or failure
  • replications of process are independent
  • P(success) is constant for each replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

normal distribution

A
m = mean
s = standard deviation 

mean = median = mode and are located at the center of the distribution (not skewed)

area under curve = probability of observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

2 statistical inference methods:

A
  1. Estimation

2. Hypothesis Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Estimation

A

sample statistics are used to generate estimates of the population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Hypothesis Testing

A

Sample statistics are analyzed to either support or reject the hypothesis about the parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Are statistics from different samples in the same population the same?

A

No, the sample mean of the second sample is likely to be different from the first sample mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

sampling distribution

A

consists of multiple sample means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

point estimate

A

the “best” single estimate of that parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

confidence interval

A

range of plausible values for the population parameter; carries a level of confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

confidence level

A

reflects the likelihood that the confidence interval contains the true, unknown parameter;

90%, 95%, and 99%

If we repeatedly generate similar Confidence Intervals for the same population, 95% of those intervals will cover the true parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

As Confidence Level _____, Confidence Interval _____.

A

As Confidence Level increases, Confidence Interval widens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

standard error

A

reflects the variability of the sampling distribution of the sample statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

estimated standard error formula

A

s/ square root of n

s = sample std. dev.
n = sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

As sample size _____ , standard error _____ .

A

As sample size increases, standard error decreases.

Small samples have a lot of standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

population standard deviation can be _____ by sample standard deviation.

A

replaced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

The midpoint of the Confidence Interval is _____.

A

the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

margin of error formula

A

Z * s/square root of n

s = sample std. dev.
n = sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Z reflects the critical value for _____.

A

confidence level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Confidence interval formula

A

Sample mean +/- Z * s/square root of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

null hypothesis (H0)

A

assumes nothing is going on, usually carries equality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

alternative hypothesis (HA)

A

the “research hypothesis”

reflects the researcher’s belief

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Hypothesis Testing: 2 Possible Conclusions

A
  1. Reject the null hypothesis

2. Fail to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Hypothesis Testing: 2 Possible Hypotheses

A
  1. null hypothesis

2. Alternative hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Hypothesis Testing Procedures

A
  1. Set up a null and research hypothesis
  2. Determine significance level - acceptable rate at which a Type I error can occur.
  3. Select test
  4. Compute test statistic
  5. Compute p-value
  6. Compare p-value to alpha
  7. Draw conclusion + summarize significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

3 Choices for Hypothesis Statements

A
  1. Non-Directional (key word = difference); not equal
  2. Directional (key word = greater, more, positive direction); greater than
  3. Directional (key word = less, smaller, negative direction); less than
58
Q

Non-Directional hypothesis testing (Two-Tailed)

A

H0 : μ = x

HA : μ ≠ x

59
Q

Directional hypothesis testing (Right-Tailed)

A

H0 : μ = x

HA : μ > x

60
Q

Directional hypothesis testing (Left-Tailed)

A

H0 : μ = x

HA : μ < x

61
Q

Hypothesis Testing - Decision Making

A

If test statistic > critical value = reject the null

62
Q

p-value

A

the probability of observing the obtained data (or more extreme values) given the null hypothesis was true

use to measure the significance of the test (is there enough evidence to reject H0?)

63
Q

_____ the null hypothesis if the p-value is _____ than the alpha level

A

Reject; lower

64
Q

Type I Error

A

(Alpha)

Reject a true null hypothesis

Most dangerous type of error

65
Q

Type II Error

A

(Beta)

Fail to reject a false null hypothesis

66
Q

alpha

A

probability of making a Type I error

error rate

67
Q

beta

A

probability of making a Type II error

error rate

68
Q

power

A

1-beta

rate at which a test correctly rejects a null hypothesis

69
Q

power is dependent on _____

A

effect size;

larger effect size; we can detect that more readily than a small effect size

70
Q

Small effect sizes may require _____ sample sizes

A

larger

71
Q

Chi Square test of independence

A

determines whether 2+ categorical variables are independent or share an association

72
Q

Chi Square Test Statistic formula

A

X^2 = the sum of (observed - expected)^2/expected

73
Q

Expected value formula (Chi Square test for independence)

A

(column total * row total) / total

74
Q

Chi Square test of independence - Degrees of freedom formula

A

Df = (# of rows - 1) * (# of columns - 1)

75
Q

2 Independent Sample T Test

A

measures the difference of 2 unrelated population means of continuous outcomes

population variance is unknown

76
Q

ANOVA F-Test

A

determines whether or not the means of more than 2 populations are statistically different

77
Q

Hypothesis Testing is only for _____.

A

population parameters

78
Q

correlation

A

measures the strength of the linear relationship between 2 continuous variables;

equivalent to simple linear regression

79
Q

regression

A

estimates the value of one continuous variable corresponding to a given value of another variable

80
Q

Correlation Coefficient

A

r;

measures the strength of the linear relationship between x & y

81
Q

correlation coefficient range

A

-1 to +1

82
Q

Correlation coefficient sign

A

indicates nature of relationships

positive=direct; negative=inverse

83
Q

r^2

A

percent variation attributed to predictor variables

range from 0 (low variation explanation) to 1 (explains a lot of variation)

Want to be high ;)

84
Q

Simple linear regression formula

A

Y = β0 +β1x + error

Y = dependent/outcome variable
X = independent/predictor variable
β0 = intercept
β1 = slope
85
Q

linear regression example

A

What is the expected Systolic BP for a male with BMI=20?

Y = SBP; X = BMI

86
Q

scatterplot

A

helps to visualize relationships in bivariate data

87
Q

r = 0.4. What is the percent variation?

A

r^2 = 0.4^2 = 0.16 x 100 = 16%

88
Q

bar plot

A

for categorical data

89
Q

histograms

A

for continuous and ordinal data

90
Q

box (and whisker) plots

A

for continuous data possibly with outliers or skewed data

91
Q

categorical variable

A

fixed # of outcomes (nominal scale)

2 possible outcomes = Dichotomous variable

92
Q

ordinal variable

A

fixed number of outcomes with an inherent order

ordinal scale

93
Q

continuous variable

A

outcome (interval or ratio) may be any numerical value between a defined minimum and maximum

E.g. GPA is any # between 0.0 and 4.0

94
Q

Summarizing categorical or ordinal variables

A
  1. use frequencies (counts of categories)
  2. Use relative frequencies (percentages of categories)
  3. present in table format
  4. graph in a bar chart
95
Q

Summarizing continuous variables

A
  1. central tendency: sample mean, (X bar) median (2nd Quartile), mode
  2. Variability: sample std dev, variance, range, or Interquartile range (3rd - 1st quartile)
96
Q

sample standard deviation

A

(s) spread from mean in original units

97
Q

variance

A

(s^2) spread from mean in squared units

98
Q

Interquartile Range

A

3rd - 1st Quartiles

99
Q

Variability

A

how spread out are values in the population?

100
Q

Histograms

A

graphical representation of the distribution of (continuous or ordinal) data

shapes reflects distribution type, which determines which numerical summary to use

101
Q

Normal distribution shape

A

more observations in the middle

mean=median-mode

symmetric about the mean; area to the left/right = 0.5

102
Q

Positive skew

A

more observations in the left, tail to the right

mean > median

103
Q

Negative skew

A

more observations to the right, tail to the left

mean < median

104
Q

Graphing skewed data

A

use box ( and whisker) plot

shows sample minimum (Left whisker) + maximum (right whisker)

1st Quartile
(left edge of box);
2nd Quartile (middle of box = median)/;
3rd Quartile (right side of box)

105
Q

Percentile

A

the kth percentile is a value where k% of all other values fall below:

Scored in 90 Percentile = scoring better than 90% of people who took the exam

106
Q

Normal Distribution 68/95/99 Rule

A
  • 68% of population within 1 standard deviation of mean

95% of population within 2 standard deviations of mean

99% of population within 3 standard deviations of mean

107
Q

Z score formula

A

Z = (X - mean)/Std dev

transform any normal value into a standard value

108
Q

Two Sample Z Test

A
  • want to to know is there a difference in population means between two groups

population variance is known

109
Q

Chi Square Goodness of Fit

A

Does the sample come from a hypothesized distribution?

for continuous data: divide data into intervals, then apply test

110
Q

For continuous independent and dependent variables use _____ (measure of association).

A

correlation

111
Q

For dichotomous independent and dependent variables use _____ (measure of association).

A

relative risk -or- odds ratio

112
Q

relative risk (RR)

A

risk of getting the disease with the risk factor compared to the risk of getting the disease without the risk factor

(a/(a+b))/(c/(c+d))

113
Q

odds ratio (OR)

A

ratio of the odds of having the disease with the risk factor compared to the odds of having the disease without the risk factor

(a/c)/(b/d) -or- ad/bc

114
Q

If the value 1 is included within confidence interval, then the OR or RR is _____. Otherwise it is _____.

A

not significant; significant

115
Q

Simple linear regression

A

Models the relationship between independent (X) and dependent (Y) variables;

Dependent (Y) variable must be continuous

116
Q

When X increases by _____ unit, Y changes by _____.

A

1 unit; B1 (slope)

117
Q

If B1 > 0 then X and Y are _____ proportional and variables have _____ association

A

directly; positive

118
Q

If B1 < 0 then X and Y are _____ proportional and variables have _____ association

A

inversely; negative

119
Q

If B1 = 0 then X and Y are _____ and variables are _____.

A

not related; not related

120
Q

logistic regression

A

used when dependent (Y) variable is dichotomous

Ex: Someone has the disease or not

121
Q

e^B1 = ____

A

odds ratio when X increases by 1 unit

122
Q

multiple regression

A

models the relationship between dependent (Y) and independent (X) variables while also considering other variables that may affect the relationship (e.g. confounders)

more than 1 independent (X) variable

123
Q

survival analysis

A

collection of statistical procedures used for outcome that is time until an event

From the time we start to observe, when does the event occur?

goal: analyze survival experience of a population of interest

124
Q

Survival analysis - time

A

measure of time from the beginning of follow-up until the event for an individual

e.g. days, weeks, months, years

125
Q

Survival analysis - event

A

occurrence of interest

e.g. death, disease incidence, relapse, recovery

126
Q

survival analysis - censoring + 3 reasons

A

exact survival time is unknown

three reasons

  1. study ends before an individual experiences event
  2. individual is lost to follow-up during the study
  3. individual is withdrawn from the study (e.g. death before event of interest occurs).
127
Q

3 types of censoring

A
  1. right censored
  2. left censored
  3. interval censored
128
Q

right censored data

A

we know when survival time starts, but not when or if event occurs

129
Q

left censored data

A

start of survival period is unknown

E.g. survival time of HIV patient begins at infection, but may not enter study until tested positive

130
Q

interval censored data

A

the exact time of the even is unknown within the interval

occurs in studies where subjects are not monitored continuously

131
Q

survival function/curve

A

in theory, are continuous and smooth

Common application is to compare survival functions of two groups

132
Q

Kaplan Meier estimator

A

method used to practically visualize survival curves for a study

estimated as a step function

1 step down = 1 event occurred

does not usually decrease to 0, not everyone will experience event during the study

133
Q

log rank test

A

if test rejects, the survival curves are significantly different;

works for 2+ groups

does not tell you which is better (visually compare or compare means)

134
Q

reliability

A
  • Consistency of measures
  • Are similar results produced under similar conditions
  • Uses Cronbach’s alpha
  • high reliability does not mean high validity (accuracy)
135
Q

Cronbach’s alpha

A

an indicator of internal consistency

ranges from 0 to 1

higher values = higher internal consistency

136
Q

Validity

A
  • Accuracy of a measure
  • Does the result actually reflect the true measure
  • Often difficult to know if a measure is valid
137
Q

confounding

A

extraneous variable that distorts the true effect of the independent variable (exposure) on the dependent variable (outcome)

138
Q

Ways to control confounding

A
  1. Stratification (single confounder)

2. Regression (multiple confounders)

139
Q

Stratification

A

conduct separate analysis for each level of a confounding variable

140
Q

Effect Modification

A

the effect of an independent variable (X) on the dependent variable (Y) differs depending on the level of the third variable

141
Q

Poisson distribution

A

models # of events out of infinite (in theory) observation

not practical

use when the event is rare or when modeling # of events over space of time

142
Q

Increasing sample size _____ variability of the estimate.

A

decreases