Cards Flashcards

1
Q

Biostatisticians prefer type ______ errors to type _____ errors.

A

Biostatisticians prefer type I errors to type II errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Failing to reject the _______ hypothesis (Type II error) is a more conservative statement than rejecting the null hypothesis because it errs on the side of caution and can usually be rectified with further studies.

A

Null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the symbol for sample standard deviation?

A

s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the symbol for population standard deviation?

A

σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The higher the value of n (sample size) and thus the higher the degrees of freedom, the closer a ___ distribution resembles a ___ distribution.

A

The higher the value of n (sample size) and thus the higher the degrees of freedom, the closer a t distribution resembles a z distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

At around an n of ___, the t distribution is close enough to the z distribution that it makes no significant difference.

A

30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for degrees of freedom?

A

n - 1 (Where n=sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In hypothesis testing, rejecting a true null hypothesis is called ______ error.

A

In hypothesis testing, rejecting a true null hypothesis is called type I error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In hypothesis testing, failing to reject a false null hypothesis is called _____ error.

A

In hypothesis testing, failing to reject a false null hypothesis is called type II error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

_________ is the measure of new cases of a disease.

A

Incidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

__________ is the number of new cases of a disease in a population over a specified time period.

A

Cumulative incidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

__________ is the number of new cases of the disease during person-time of observation where time is measured as the amount of time people are followed or exposed, ranging from before the onset of disease to the end of follow-up.

A

Incidence rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

_________is the number of existing cases of a disease during a given time period.

A

Prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

___________ is the proportion of the population that is diseased at a single point in time, such as a specific calendar date.

A

Point prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

_________ is the proportion of the population that is diseased during a specific duration of time, such as during a specific year.

A

Period prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

_______ is a situation in a community in which there is a consistent elevated rate of a certain disease.

A

Endemic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

______is an increase in the number of cases of disease in a community, above what is expected.

A

Epidemic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

___ is a worldwide epidemic.

A

Pandemic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

______are calculated by dividing one number by another but the numerator does not need to be a subset of the denominator as they are two distinct quantities. Ex. Men:women

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

______are calculated by dividing one number by another, where the numerator is a subset of the denominator. Ex: number of men/total population

A

Proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

_____ are calculated by dividing one number by another, and additionally have a time component as a part of the denominator. Ex: The number of people who developed influenza in 2017.

A

Rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the two overall types of epidemiologic study designs?

A

Descriptive and analytic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

______ studies are generally observational, whereas ______ studies can be both interventional (experimental) and observational.

A

Descriptive Analytic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

This type of descriptive study is used to alert people of a new illness or association with an illness. They are usually reports of only people with the condition of interest.

A

Case studies and case reports

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
These type of descriptive studies include people who are representative of a given population. They are not selected on the basis of illness or exposure and can be used to determine initial associations and to identify the prevalence of either exposure or illness in a group.
Cross-sectional studies
26
This type of descriptive studies are used to describe populations. The data are not analyzed on the individual level, but rather on the aggregate level.
Ecological studies
27
In _______, group-level data are used to report on individuals. This type of mistake occurs in ecological studies.
Ecological fallacy
28
In this type of analytic study, people are selected based on whether they have or do not have a disease and then the researchers proceed to look back over time to see if people had different rates of exposure.
Case-control
29
This type of study is good for rare diseases with long latency periods.
Case-control
30
This analytic study type selects people on the basis of exposure and determines if people develop disease at different rates.
Cohort studies
31
This type of study is good for rare exposures.
Cohort studies
32
When a study follows individuals into the future.
Prospective
33
When a study looks back in time
Retrospective
34
This type of analytic study allows researchers to calculate incidence.
Cohort studies
35
People who have disease at the time point when the study begins are excluded from cohort studies. These cases are known as _______ cases.
prevalence
36
This type of analytic study tests an intervention that is given by the researcher to two or more groups.
Randomized controlled trial (RCT)
37
________ is when people are randomly assigned to groups in a randomized control trial.
Randomization
38
In _______ blinded studies, participants don't know what group they are in and in ______ blinded studies, neither participants nor researchers know which group participants are in.
single double
39
These analytic studies pool the results of multiple independent studies with established criteria to identify the evidence for associations.
Meta-analyses and systematic reviews
40
The hierarchy of types of study designs by amount of evidence collected goes: expert opinion -> case series and case reports -> cross-sectional -> case-control -> cohort ->RCT -> systematic reviews and meta-analyses
expert opinion -> case series and case reports -> cross-sectional -> case-control -> cohort ->RCT -> systematic reviews and meta-analyses
41
The three types of descriptive observational study designs are?
case reports, case series, and cross-sectional
42
The two types of experimental study designs are?
randomized controlled trials and non-randomized trials
43
The three types of analytical observational studies are?
Cohort, case-control, and cross-sectional
44
Draw a 2x2 table for epidemiological calculations
outcome and no outcome go at top for column headers and exposure and no exposure go on left for row headers. outcome in exposed is A, outcome in unexposed is C, total outcome is A+C, no outcome among exposed is B, no outcome and not exposed is D, and total with no outcome is B+D. Total exposed is A+B, total unexposed is C+D, and total is A+B+C+D
45
_______ is the measure of the magnitude of an association between an exposure and a disease that is used in cohort studies. It is a ratio of the risk (incidence) of disease in the exposed tot eh risk in the nonexposed.
Relative risk (RR)
46
The formula for relative risk is?
RR=(Risk [incidence] of outcome in the exposed)/Risk [incidence] of outcome in the nonexposed. If using a 2x2 table, this appears as A/(A + B) divided by C/(C + D)
47
An RR or OR of _ means there is no association between the exposure and the outcome. The risk or odds in the exposed equals the risk or odds in the nonexposed.
1
48
An RR or OR of _ means the exposure increases the risk of the outcome. The risk or odds in the exposed is greater than the risk or odds in the nonexposed.
>1
49
An RR or OR _ means the exposure decreases the risk of the outcome. The risk or odds in the exposed is less than the risk or odds in the non-exposed. This indicates that the exposure is a protective factor.
<1
50
____ is a systematic error as compared to an error attributable to chance alone. It can cause an error in the estimation of an association between an exposure and an outcome.
Bias
51
This type of bias results from procedures used to select participants into a study. This bias results in a different outcome from what would have been obtained from the entire population targeted for the study.
Selection bias
52
Selection bias most likely occurs in _______ or _____ studies because the exposure and outcome have occurred at the time of study selection. It can also occur in prospective cohort studies and experimental studies from differential loss to follow-up because this affects which subjects are "selected" for analysis.
case-control or retrospective cohort
53
This type of bias arises from systematic differences in the way information on exposure or disease is obtained from the study groups.
Observation bias/information bias
54
This type of bias involves inaccurate reporting of past events
recall bias
55
This type of bias may include the effects of the interviewer's body language, voice, or demeanor on the response; it's the most difficult type of bias to account for.
Interviewer bias
56
This type of observational bias happens when participants are incorrectly classified into the wrong population. It distorts the link between exposure and outcome. It can result from participants being incorrectly classified as exposed or unexposed or having the outcome or not having the outcome.
Misclassification error
57
Differential misclassification is bias that is ______different between groups.
different
58
Nondifferential misclassification bias is bias that is _____ across groups.
Equal
59
___________ occurs when a researcher is evaluating the relationship between an exposure and an outcome, but a third variable, which is associated with both the exposure and the outcome, distorts the finding.
Confounding
60
Methods to prevent confounding and to manage confounding during the analysis stage include:_____, _______, ______,______, ______, and conducting__________.
randomization, restriction, matching, standardization, stratification, and conducting multivariable analysis
61
___________ occurs when the magnitude of the association between an exposure and outcome varies by the presence or level of a third variable. Ex: smokers who are exposed to asbestos have a much higher rate of lung cancer than smokers with no asbestos expsure
Effect modification
62
The ______ is the odds of an outcome in the exposed divided by the odds of outcome in the non-exposed.
odds ratio
63
The simple formula for odds ratio is?
p/(1-p) | the number of disease in a group divided by the number without the disease of interest
64
Case-control studies and cross-sectional studies use which measure of association?
Odds Ratio
65
Formula for odds ratio?
OR = (Odds of outcome in the exposed)/(Odds of outcome in the non-exposed) In a 2x2 table, this works out to (A/B)/(C/D) or AD/BC
66
Although both a confounder and a modifier are third variables that impact the association between the exposure and the outcome, a ______ distorts the true association.
confounder
67
When one conducts an analysis, one controls for the impact of a confounder but evaluates for the impact of a(n) ________.
Effect modifier
68
T/F: A variable can be a confounder, effect modifier, or both at the same time.
True
69
Differentiating between whether a variable is a confounder, effect modifier, or both is done through ________
stratification
70
_________ divides the data according to the levels of the variable and allows the calculation of a measure of association for each strata.
stratification
71
If the variable is only a confounder, stratum-specific estimates for the measure of association, either OR or RR, will be _________. The crude, or overall estimate will be outside the range of the stratum-specific estimates.
Close to one another
72
If the variable is only an effect modifier, the stratum-specific estimates for the OR and the RR will be_________.
significantly different from one another
73
Kock's postulates were one of the early criteria for identifying causation. They are generally more effective for determininig the causative agent for ________.
infectious diseases
74
Name at least three of Hill's Nine Criteria of Causality.
1. analogy 2. coherence 3. reversibility 4. specificity 5. plausibility 6. strength of the association 7. consistency 8. biological gradient 9. temporality
75
________ is the earliest stage of prevention and it is concerned with preventing risk factors of disease by targeting lifestyles, behaviors, and exposure patterns at the aggregate level instead of the individual level in order to decrease the risk of a disease.
Primordial prevention
76
________ is the level of prevention concerned with preventing disease. It takes place before the biological onset of the disease. Ex: having individuals use condoms so they don't get STIs.
Primary prevention
77
_______ is the level of prevention that is addressed by screening programs and occurs in the preclinical phase, after the disease is present but before symptoms appear. The focus is on early detection o that treatment can be provided before disease progresses. Ex: annual screening for breast cancer
Secondary prevention
78
_____ is the level of prevention that is focused on rehabilitation and support. The disease has already occured but the goal is to improve quality of life and reduce symptoms. Ex: patient education and support after diabetes diagnosis
Tertiary prevention
79
__________is a technique used to identify individuals who have a disease before symptoms occur with the goal of providing treatment before they experience any illness.
Screening
80
_________is concerned with how likely the target audience is to participate in a recommended screening program.
Feasibility
81
In regards to screening tests, _________ refers to repeatability.
reliability
82
In regards to screening tests, ___________ is the ability of a test to accurately identify diseased and non-diseased individuals.
validity
83
Validity of screening tests is measured by the ______ and __________ of the test.
sensitivity and specificity
84
Individuals who have a disease and who test positive for the disease on a screening test are known as __________.
true positives
85
Individuals who have the disease, but who do not test positive on a screening test are called __________.
False negatives
86
___________refers to the ability of a test to correctly identify the number of people without a disease. Expressed as a percentage and consists of a proportion. Only concerned with people WITHOUT the disease.
specificity
87
How is specificity calculated?
of individuals who truly do not have the disease and who test negative/(# of individuals who truly do not have the disease and who test negative + those without the disease who tested positive)
88
Individuals who truly do not have the disease and test negative are____________.
True negatives
89
Individuals who do not have the disease, but who tested positive are called___________.
False positives
90
How is positive predictive value (PPV) calculated?
the number of people who test positive who actually have the disease/the number of positive tests.
91
How is negative predictive value (NPV) calculated?
the number of people who test negative for a disease and who do not have the disease divided by the total number of people who test negative.
92
Sensitivity and specificity are ________ related.
inversely
93
This type of bias involves overestimation of survival duration attributable to earlier detection by screening than by clinical presentation. Screening tests allow early detection and diagnosis of diseases. Individuals whose disease was identified through screening may appear to survive longer, even when the time from disease onset until death is similar to individuals whose disease was diagnosed later.
Lead time bias
94
This type of bias is due to screening being more likely to detect cases that are progressing slowly compared with those with rapid progression of the disease, who manifest clinically. The slow-progressing cases are usually milder and more likely to survive, leading to an overestimation of survival as a result of screening.
length bias
95
________refers to whether the calculated estimate is likely to be observed assuming the null hypothesis of the test is true. P-values are measure of statistical significance in which lower values are associated with increased likelihood that the null hypothesis does not hold.
statistical significance
96
_____________is a subjective assessment of whether the effect estimated in a test is important or meaningful.
practical significance.
97
___________are factors in a social environment that contribute to or detract from the health of individuals and communities. They include things like socioeconomic status, transportation, housing, access to services, discrimination by social grouping, and social or environmental stressors.
social determinants of health
98
In this type of surveillance, a research team goes into the community looking for cases of disease. Very accurate but expensive.
Active surveillance
99
This type of surveillance relies on existing reporting systems
passive surveillance
100
This type of surveillance refers to web crawling to identify reports of disease. Ex: the Global Public Health Intelligence Network, which is part of the World Health Organization's Global Outbreak Alert and Response Network.
digital surveillance
101
This type of surveillance monitors a special community to look for changes in distribution of disease. Ex: the National Institute for Occupational Safety and Health's ______ Event Notification System for Occupational Risks.
sentinel surveillance
102
How is the prevalence of a disease impacted by the introduction of a new medication that improves survival?
It goes up because more individuals will live/live longer.
103
Do you report an odds ratio or a relative risk for case-control studies?
Odds ratio because you do not have data on incidence
104
(descriptive or inferential) statistics involves taking raw data and providing summarizing information or depicting the data through various figures. It is more often performed on a small subset of a population, known as a sample.
Descriptive
105
(descriptive or inferential) statistics allows researcher to draw conclusions on the population based on information collected from the sample.
inferential
106
_____ or _____ variables are data collected on the category in which the participant falls. The groups have no inherent order (ie: race).
Nominal or categorical variables
107
________ variables are a specific type of variable that have only two possible values (ie: exposure status as exposed or unexposed)
Dichotomous variables
108
________ variables are similar to nominal variables but the categories have an inherent order (ie: socioeconomic status or level of education)
Ordinal
109
________ variables can theoretically take any value between a minimum and maximum value.
Continuous
110
________ variables are a type of continuous variable that have a distinct order and clearly defined intervals. They lack a true zero, or a zero value that is equivalent to an absence of the variable. The values also fail to reveal ratios of amounts. Ex: pH
interval variables
111
_______ variables are similar to interval variables, but they have a true zero and values of the variable act as true ratios of one another (ie: temperature in Kelvins)
Ratio variables
112
_______ tables are used for nominal and ordinal variables. They reveal important information on the frequency, relative frequency, cumulative frequency, and cumulative relative frequency of the categories of the variables.
Frequency tables
113
Cumulative frequencies and cumulative relative frequencies are only relevant for _______ variables.
ordinal
114
Continuous variables utilize measures of _______ ______ to give single values that describe the entire data set. Commonly used ones are mean, median, and mode.
central tendency
115
_____ is the arithmetic average of the data set calculated as the sum of total values divided by the number of data values.
Mean
116
The______ is the middle value of the data set when the data points are ordered by numerical value. If the data set has an even number of values, it is the arithmetic average of the two middle values of the data set.
median
117
The ______ is the most commonly occurring value in the data set.
mode
118
__________ __ _________ provide information on the spread of the data set and capture the amount of deviation of each of the variables.
Measures of variability
119
____and ______ are the most common measures used to inform on variability.
Variance (s^2) and standard deviation (s)
120
_____ are the 25th and 75% values of the data set.
quartiles
121
The spectrum of values between the 25% value and the 75% value is termed the _______ _______.
Interquartile range.
122
The _______ is defined as the spectrum of values between the lowest and highest value.
range
123
__________ is a numerical value applied to the likelihood for the occurrence of an event.
Probability
124
What is the formula for calculating the probability of a characteristic?
P(Characteristic) = (# of individuals with characteristic)/(# of total individuals in sample/population)
125
What is the formula for calculating conditional probability, or the probability of a characteristic given another characteristic?
P(A|B) = (# of individuals with Characteristic A and B)/(# of total individuals in sample or population with characteristic B)
126
__________ is used to define a circumstance when the probability of one event does not have an impact on the probability of another event.
Independence
127
To test for independence of two events, one needs to assess the conditional probabilities. How is this done (formulas?)?
If A and B are independent, then P(A|B) = P(A) and P(B|A) = P(B)
128
______ distribution is a model of the distribution of a dichotomous outcome variable, such as testing positive or negative for a disease.
Binomial
129
_______/_________ distribution is a model of the distribution of a continuous outcome variable, such as blood pressure.
Normal/Gaussian
130
Tests that depend on assumptions about the underlying distribution are known as ________ tests.
parametric
131
When data is symmetric around the median, mode, and mean, the distribution is considered ____ and not ______.
normal and not skewed
132
The standard normal distribution has a mean of ____ and a standard deviation of ______.
0.0 and 1.0
133
Probabilities for the likelihood that a certain value or a more extreme value would be observed are tabulated through a _______ table, which is often used for inferential statistics analyses.
z-scores table
134
The Poisson distribution is useful for what type of data?
Count data
135
The ____ _____ ______states that, with repeated sampling, the individual mean calculations of samples form a normal distribution.
central limit theorem
136
The ______ is notated as H sub 0 and is a precise statement.
null hypothesis
137
The _____/_____ hypothesis is a more ambiguous statement that rivals the null hypothesis.
alternate/research
138
Hypothesis testing is based on the concept of finding enough evidence to reject the ______ ______ or, alternatively, failing to find enough evidence to reject the ___ _________.
Null hypothesis and null hypothesis
139
The results of biostatistical analysis culminate in the researcher "rejecting" or "failing to reject" the ________ in favor of the _________ hypothesis
null alternate
140
_________ are probability values that measure the likelihood of obtaining the observed statistic or more extreme values when the null hypothesis is true.
P-values
141
Researchers must determine what their benchmark for rejection needs to be prior to starting their study; this is called the ________ or ________ value.
level of significance or alpha
142
T/F? In a one-sided test, the sample deviates from the null hypothesis conditions in one specific direction, such as being significantly greater than or significantly less than the null's means. The two-sided test does not specify directionality, and therefore is more conservative in its approach.
True
143
If the p-value is _____ than alpha, the researcher fails to reject the null hypothesis.
greater
144
If the p-value is _____ than alpha, the researcher rejects the null hypothesis.
less
145
___ distributions are useful when data and population means (mu) and standard deviations are available, this information is usually unavailable and this is when ___ distributions prove useful.
z t
146
For the one-sample t-test, the degrees of freedom are _______.
n - 1 (number of observations - 1)
147
_________ is an important concept that is related to type II errors. If β is the probability of falsely failing to reject the null hypothesis, then 1 -β is the probability of correctly rejecting the null hypothesis. This is the ___________ of a study to correctly identify a deviation from the null hypothesis conditions.
Power power
148
What are the three things you can do to maximize the power of a study?
1. ) Increase alpha - this will shift the cutoff for rejecting the null to the left and will increase power but may also result in an increased likelihood for a type I error. 2. ) Increase the effect size - This deals with the effect of the conditions that cause a deviation from the conditions of the null. 3. ) Increasing the sample size - This is the method researchers typically use to obtain the power necessary for their study. Calculating the sample size necessary for a specific power therefore depends on the assigned alpha, the desired power (1-β), and the estimated effects size.
149
________ use data from a sample to set upper and lower bounds to the estimate of a parameter. They are typically displayed using one of two formats. Either: 1. ) Lower limit, upper limit - or - 2. ) Point estimate +/- margin of error
Confidence intervals
150
The z-score, standard error, and point estimate of the parameter all combined together form the __________.
confidence interval
151
T or F? A 95% confidence interval for a parameter means the following: P (Lower limit of confidence interval < population parameter < upper limit of confidence interval) = 0.95
True
152
If 1.0 falls _______ of the confidence interval, there is statistical significance for an increase or decrease in risk or odds.
outside