Biostatistics Flashcards

(142 cards)

1
Q

Descriptive statistics

A

the collection, organization, summarization, and analysis of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential staitistics

A

drawing inferences about a body of data when only a part of the data is the observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

population

A

defined by a sphere of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample

A

subgroup or subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

parameter

A

characteristics or measure obtained from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

statistic

A

characteristics or measure obtained from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We compute _____ and use them to estimate _____.

A

We compute statistics and use them to estimate parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

nominal scale

A

The lowest measurement scale.

Used for naming or labeling, not ordering.

Though numbers can be used, the relationship between the numbers are not meaningful.

Ex: Categorical and Dichotomous variables (Marital status, DL #, SSN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ordinal scale

A

observations are ranked; level of differences between ranks is unknown

Ex: Low, Medium, High; Likert-type scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

interval scale

A

observations are ranked; level of differences between ranks is equal; scale is relative

No true zero point, so ratios are meaningless.

Ex: Temperature (F/C) or pH scales (0 does not equal absence of heat/acidity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ratio scale

A

observations are ranked; level of differences between ranks is equal;

true zero point exist

Ex: height, length, Kelvin Temperature scale (defines 0K as absolute zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures of disease frequency

A

count, ratio, proportion, rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

count

A

of cases of a disease or other health condition;

Ex: dorm students with COVID-19

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

proportion

A

measure that states a count relative to the size of the group;

numerator/denominator

Ex: dorm students with COVID-19/all student

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ratio

A

divide one number into another number

numerator does not have be a subset of denominator

Ex: dorm students with COVID-19/dorm students with flu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

rate

A

similar to ratios and proportions, but includes a time components

Ex: % of dorm students with COVID-19 in 2020

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Descriptive Study Examples

A
  • case studies/reports
  • cross-sectional studies
  • ecological studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Analytical Study Examples

A
  • Case-control Studies
  • cohort studies
  • randomized control studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cohort Study

A

begin with a group of people who are disease free at baseline

Follow over time and classify on exposure; identify incident cases

MOA: Relative risk

Good for prevalent diseases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Case-Control Study

A

Compare Diseased (cases) to Disease free (controls)

Classify on disease status; collect exposure data retrospectively

MOA: Odds ratio

Good for rare disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

RR or OR = 1

A

no association between exposure and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

RR or OR > 1

A

exposure increases risk of the outcome

Positive (direct) association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

RR or OR < 1

A

exposure decreases risk of the outcome

Negative (inverse) association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

RR range

A

-1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
When interpreting OR, begin with the _____
outcome
26
When interpreting RR, begin with the _____
exposure
27
Attributable risk
tells us how much of the disease that occurs can be attributed to a certain exposure calculate among exposed individuals or an entire population
28
background risk
the risk of non-exposed people is not zero Ex: some people who get lung cancer do not smoke
29
Attributable risk formula
(incidence in exposed) - (incidence in unexposed)
30
simple random sample
enumerate all members of the population N select n individuals at random (each has the same probability of being selected)
31
systematic sampling
1. start with sampling frame 2. determine sampling interval (N/n) 3. select first person at random from first (N/n) and every (N/n) thereafter.
32
Stratified sampling
organize population into mutually exclusive strata, select individuals at random within each stratum
33
binomial distribution
- models # of events out of n observations - 2 possible outcomes: success or failure - replications of process are independent - P(success) is constant for each replication
34
normal distribution
``` m = mean s = standard deviation ``` mean = median = mode and are located at the center of the distribution (not skewed) area under curve = probability of observation
35
2 statistical inference methods:
1. Estimation | 2. Hypothesis Testing
36
Estimation
sample statistics are used to generate estimates of the population parameter
37
Hypothesis Testing
Sample statistics are analyzed to either support or reject the hypothesis about the parameter.
38
Are statistics from different samples in the same population the same?
No, the sample mean of the second sample is likely to be different from the first sample mean.
39
sampling distribution
consists of multiple sample means
40
point estimate
the "best" single estimate of that parameter
41
confidence interval
range of plausible values for the population parameter; carries a level of confidence
42
confidence level
reflects the likelihood that the confidence interval contains the true, unknown parameter; 90%, 95%, and 99% If we repeatedly generate similar Confidence Intervals for the same population, 95% of those intervals will cover the true parameter.
43
As Confidence Level _____, Confidence Interval _____.
As Confidence Level increases, Confidence Interval widens.
44
standard error
reflects the variability of the sampling distribution of the sample statistic
45
estimated standard error formula
s/ square root of n ``` s = sample std. dev. n = sample size ```
46
As sample size _____ , standard error _____ .
As sample size increases, standard error decreases. Small samples have a lot of standard error
47
population standard deviation can be _____ by sample standard deviation.
replaced
48
The midpoint of the Confidence Interval is _____.
the mean
49
margin of error formula
Z * s/square root of n ``` s = sample std. dev. n = sample size ```
50
Z reflects the critical value for _____.
confidence level
51
Confidence interval formula
Sample mean +/- Z * s/square root of n
52
null hypothesis (H0)
assumes nothing is going on, usually carries equality
53
alternative hypothesis (HA)
the "research hypothesis" reflects the researcher's belief
54
Hypothesis Testing: 2 Possible Conclusions
1. Reject the null hypothesis | 2. Fail to reject the null hypothesis
55
Hypothesis Testing: 2 Possible Hypotheses
1. null hypothesis | 2. Alternative hypothesis
56
Hypothesis Testing Procedures
1. Set up a null and research hypothesis 2. Determine significance level - acceptable rate at which a Type I error can occur. 3. Select test 4. Compute test statistic 5. Compute p-value 6. Compare p-value to alpha 7. Draw conclusion + summarize significance
57
3 Choices for Hypothesis Statements
1. Non-Directional (key word = difference); not equal 2. Directional (key word = greater, more, positive direction); greater than 3. Directional (key word = less, smaller, negative direction); less than
58
Non-Directional hypothesis testing (Two-Tailed)
H0 : μ = x | HA : μ ≠ x
59
Directional hypothesis testing (Right-Tailed)
H0 : μ = x | HA : μ > x
60
Directional hypothesis testing (Left-Tailed)
H0 : μ = x | HA : μ < x
61
Hypothesis Testing - Decision Making
If test statistic > critical value = reject the null
62
p-value
the probability of observing the obtained data (or more extreme values) given the null hypothesis was true use to measure the significance of the test (is there enough evidence to reject H0?)
63
_____ the null hypothesis if the p-value is _____ than the alpha level
Reject; lower
64
Type I Error
(Alpha) Reject a true null hypothesis Most dangerous type of error
65
Type II Error
(Beta) Fail to reject a false null hypothesis
66
alpha
probability of making a Type I error error rate
67
beta
probability of making a Type II error error rate
68
power
1-beta rate at which a test correctly rejects a null hypothesis
69
power is dependent on _____
effect size; larger effect size; we can detect that more readily than a small effect size
70
Small effect sizes may require _____ sample sizes
larger
71
Chi Square test of independence
determines whether 2+ categorical variables are independent or share an association
72
Chi Square Test Statistic formula
X^2 = the sum of (observed - expected)^2/expected
73
Expected value formula (Chi Square test for independence)
(column total * row total) / total
74
Chi Square test of independence - Degrees of freedom formula
Df = (# of rows - 1) * (# of columns - 1)
75
2 Independent Sample T Test
measures the difference of 2 unrelated population means of continuous outcomes population variance is unknown
76
ANOVA F-Test
determines whether or not the means of more than 2 populations are statistically different
77
Hypothesis Testing is only for _____.
population parameters
78
correlation
measures the strength of the linear relationship between 2 continuous variables; equivalent to simple linear regression
79
regression
estimates the value of one continuous variable corresponding to a given value of another variable
80
Correlation Coefficient
r; measures the strength of the linear relationship between x & y
81
correlation coefficient range
-1 to +1
82
Correlation coefficient sign
indicates nature of relationships positive=direct; negative=inverse
83
r^2
percent variation attributed to predictor variables range from 0 (low variation explanation) to 1 (explains a lot of variation) Want to be high ;)
84
Simple linear regression formula
Y = β0 +β1x + error ``` Y = dependent/outcome variable X = independent/predictor variable β0 = intercept β1 = slope ```
85
linear regression example
What is the expected Systolic BP for a male with BMI=20? Y = SBP; X = BMI
86
scatterplot
helps to visualize relationships in bivariate data
87
r = 0.4. What is the percent variation?
r^2 = 0.4^2 = 0.16 x 100 = 16%
88
bar plot
for categorical data
89
histograms
for continuous and ordinal data
90
box (and whisker) plots
for continuous data possibly with outliers or skewed data
91
categorical variable
fixed # of outcomes (nominal scale) 2 possible outcomes = Dichotomous variable
92
ordinal variable
fixed number of outcomes with an inherent order ordinal scale
93
continuous variable
outcome (interval or ratio) may be any numerical value between a defined minimum and maximum E.g. GPA is any # between 0.0 and 4.0
94
Summarizing categorical or ordinal variables
1. use frequencies (counts of categories) 2. Use relative frequencies (percentages of categories) 3. present in table format 4. graph in a bar chart
95
Summarizing continuous variables
1. central tendency: sample mean, (X bar) median (2nd Quartile), mode 2. Variability: sample std dev, variance, range, or Interquartile range (3rd - 1st quartile)
96
sample standard deviation
(s) spread from mean in original units
97
variance
(s^2) spread from mean in squared units
98
Interquartile Range
3rd - 1st Quartiles
99
Variability
how spread out are values in the population?
100
Histograms
graphical representation of the distribution of (continuous or ordinal) data shapes reflects distribution type, which determines which numerical summary to use
101
Normal distribution shape
more observations in the middle mean=median-mode symmetric about the mean; area to the left/right = 0.5
102
Positive skew
more observations in the left, tail to the right mean > median
103
Negative skew
more observations to the right, tail to the left | mean < median
104
Graphing skewed data
use box ( and whisker) plot shows sample minimum (Left whisker) + maximum (right whisker) 1st Quartile (left edge of box); 2nd Quartile (middle of box = median)/; 3rd Quartile (right side of box)
105
Percentile
the kth percentile is a value where k% of all other values fall below: Scored in 90 Percentile = scoring better than 90% of people who took the exam
106
Normal Distribution 68/95/99 Rule
- 68% of population within 1 standard deviation of mean 95% of population within 2 standard deviations of mean 99% of population within 3 standard deviations of mean
107
Z score formula
Z = (X - mean)/Std dev transform any normal value into a standard value
108
Two Sample Z Test
- want to to know is there a difference in population means between two groups population variance is known
109
Chi Square Goodness of Fit
Does the sample come from a hypothesized distribution? for continuous data: divide data into intervals, then apply test
110
For continuous independent and dependent variables use _____ (measure of association).
correlation
111
For dichotomous independent and dependent variables use _____ (measure of association).
relative risk -or- odds ratio
112
relative risk (RR)
risk of getting the disease with the risk factor compared to the risk of getting the disease without the risk factor (a/(a+b))/(c/(c+d))
113
odds ratio (OR)
ratio of the odds of having the disease with the risk factor compared to the odds of having the disease without the risk factor (a/c)/(b/d) -or- ad/bc
114
If the value 1 is included within confidence interval, then the OR or RR is _____. Otherwise it is _____.
not significant; significant
115
Simple linear regression
Models the relationship between independent (X) and dependent (Y) variables; Dependent (Y) variable must be continuous
116
When X increases by _____ unit, Y changes by _____.
1 unit; B1 (slope)
117
If B1 > 0 then X and Y are _____ proportional and variables have _____ association
directly; positive
118
If B1 < 0 then X and Y are _____ proportional and variables have _____ association
inversely; negative
119
If B1 = 0 then X and Y are _____ and variables are _____.
not related; not related
120
logistic regression
used when dependent (Y) variable is dichotomous Ex: Someone has the disease or not
121
e^B1 = ____
odds ratio when X increases by 1 unit
122
multiple regression
models the relationship between dependent (Y) and independent (X) variables while also considering other variables that may affect the relationship (e.g. confounders) more than 1 independent (X) variable
123
survival analysis
collection of statistical procedures used for outcome that is time until an event From the time we start to observe, when does the event occur? goal: analyze survival experience of a population of interest
124
Survival analysis - time
measure of time from the beginning of follow-up until the event for an individual e.g. days, weeks, months, years
125
Survival analysis - event
occurrence of interest e.g. death, disease incidence, relapse, recovery
126
survival analysis - censoring + 3 reasons
exact survival time is unknown three reasons 1. study ends before an individual experiences event 2. individual is lost to follow-up during the study 3. individual is withdrawn from the study (e.g. death before event of interest occurs).
127
3 types of censoring
1. right censored 2. left censored 3. interval censored
128
right censored data
we know when survival time starts, but not when or if event occurs
129
left censored data
start of survival period is unknown E.g. survival time of HIV patient begins at infection, but may not enter study until tested positive
130
interval censored data
the exact time of the even is unknown within the interval occurs in studies where subjects are not monitored continuously
131
survival function/curve
in theory, are continuous and smooth Common application is to compare survival functions of two groups
132
Kaplan Meier estimator
method used to practically visualize survival curves for a study estimated as a step function 1 step down = 1 event occurred does not usually decrease to 0, not everyone will experience event during the study
133
log rank test
if test rejects, the survival curves are significantly different; works for 2+ groups does not tell you which is better (visually compare or compare means)
134
reliability
- Consistency of measures - Are similar results produced under similar conditions - Uses Cronbach's alpha - high reliability does not mean high validity (accuracy)
135
Cronbach's alpha
an indicator of internal consistency ranges from 0 to 1 higher values = higher internal consistency
136
Validity
- Accuracy of a measure - Does the result actually reflect the true measure - Often difficult to know if a measure is valid
137
confounding
extraneous variable that distorts the true effect of the independent variable (exposure) on the dependent variable (outcome)
138
Ways to control confounding
1. Stratification (single confounder) | 2. Regression (multiple confounders)
139
Stratification
conduct separate analysis for each level of a confounding variable
140
Effect Modification
the effect of an independent variable (X) on the dependent variable (Y) differs depending on the level of the third variable
141
Poisson distribution
models # of events out of infinite (in theory) observation not practical use when the event is rare or when modeling # of events over space of time
142
Increasing sample size _____ variability of the estimate.
decreases