Lesson 1-10 Flashcards
Defining who or what is going to be studied means defining the
population
is a smaller set or a subset of the population
sample
occurs when certain members of the population are chosen so that the sample systematically misrepresents the population
biased sample
must be created where respondents are
listed and assigned a unique number.
sampling frame
Each subject in the population has the same chance of being selected
Simple random sampling
The sampling frame is divided into subgroups or strata and simple random samples are
conducted within the strata.
Stratified random sampling
The sampling frame is ordered, and a number s is selected so that every sth subject is
selected to be in the sample.
Systematic random sampling
is how information on the subjects will be collected.
Study Designs
Subjects are identified and followed for a specific period of time.
Prospective study
a type of medical research used to investigate the causes of disease and to establish links between risk factors and health outcomes.
Cohort study
An outcome is identified, after the data have already been collected.
Retrospective study
Study where previously collected
data are reviewed to determine whether any characteristics impacted the outcome.
Retrospective study
Study where Existing data are then obtained to determine what factors were
related to subjects becoming either a case or a control.
Case control study
those having the outcome
Case subjects
those not having the
outcome
control subjects
Data are collected at a particular time point and represent a cross-section of time.
Cross-sectional study
Variables whose measurements represent a limited set of possible values.
discrete variables
values can be expressed in either?
Numbers, characters, words
These are variables with different levels or categories whose order matters. Examples
include pain scores, stages of cancer, and educational attainment
Ordinal
These are categorical variables with different levels or categories whose order does
not matter. Examples are tooth color, marital status, and political affiliation.
Nominal
These are variables that can have only two levels.
Dichotomous
True or false: Sex is an example of Dichotomous variable
True
Variables whose measurements represent an unlimited set of possible values.
Continuous
These variables can take on only positive, whole number values.
Count
True or false: Continuous variables can have only numeric values.
True
The total number of subjects with a particular category or level
Counts
is simply the count for a category divided by the total number of subjects.
Proportions
is the proportion times 100
Percentages
It provides a description of the average response
measure of center
It provides a description of how varied the responses are
measure of spread
This is commonly used to describe the center of the responses.
Mean
True or false:
when extremely large or small values are present, the mean is a better measure of the center.
False, median is a better measure
These are numerical summaries that describe the sample.
Parameters
are the numerical
summaries that an investigator wants but cannot obtain directly because collecting data on the
entire population is not feasible.
Parameters
These are numerical summaries that describe the sample.
Statistics
What are the the basic sciences of public health.
Epidemiology and biostatistics
is about the understanding of disease development and the methods used to uncover the etiology, progression, and treatment of the disease.
Epidemiology
is collected to investigate a question
Information (data)
variable consists of a summary of the possible values the variable can have and the number of subjects with each of
these values.
distribution
distribution that uses counts to describe the number of subjects with a particular
value
frequency distribution
distribution that uses proportions to describe the
number of the subjects with a particular value
probability distribution
Two types of graphs are used to summarize categorical variables
pie charts and bar graphs.
can be presented using frequencies or proportions
Pie charts
describes how the pieces relate to the whole
Pie charts
They demonstrate how the categories within a variable relate to each other
Pie charts
are used to describe the
distributions of categorical variables.
Bar graphs
are used when a data has a variable with two options.
Binomial distributions
Binomial distributions are what type of variables
dichotomous
best describe the distribution of a continuous variable
Histograms
is a graphical representation of a variable in which the observed values are categorized, a bar is drawn for each category, and the number of participants in each category is represented by the height of the bar.
Histograms
It provides a quick picture of the distribution of a variable and it can be presented with counts or
proportions of participants.
Histograms
They provide information about how spread out the
responses are, which responses are common, which responses are in the center, and the overall
shape of the distribution.
Histograms
can be folded in half so that each half is close to a mirror image of the other
Symmetric distributions
This distribution has one mode or one most common value
unimodal
A distribution with two peaks can be
bimodal
When the histogram is bell-shaped, unimodal, and symmetric, with the mean, median, and
most common value at the center at the peak, the data come from a _____
normal distribution.
can be used to determine if observations are common or
extreme
empirical rule
normal distribution is ___ skewed when the distribution has a tail that extends longer to
the left, that is, there is a set of observations with lower values than those of the majority of the
observed responses.
left
A distribution is ___ skewed when the distribution has a tail that extends
longer to the right, that is, there is a set of observations with higher values than those of the
majority of the observed responses.
right
is a discrete probability distribution whose possible values are whole numbers from 0 to infinity.
Poisson distribution
are percentages of all the observations that are less than the value of interest.
Percentiles
It is used to determine whether a particular value is common or rare.
Percentiles
measurements occurs when multiple measurements are taken on the subject.
Variability
If there is little measurement variability, the measurement has?
reliability
The idea that samples may be different
sampling variability
The value of the statistics and the number of times the statistics occur from all the possible samples is known as the?
distribution of samples or the sampling distribution
It provides a description of all possible statistics obtained from samples
sampling distribution
is the characterization of all sample means
central limit theorem
According to this theorem, the distribution of the means obtained from all possible samples will result in a normally shaped distribution, in which the center of the distribution is the true parameter and one standard
deviation of the sampling distribution is the standard error of the mean.
central limit theorem
This theorem holds true for large sample size.
central limit theorem
is a basic and commonly used type of predictive analysis.
Linear regression
It may be called an outcome variable, criterion variable, endogenous variable, or regressand.
dependent variable
It can be called exogenous variables, predictor variables, or regressors
independent variables
is the portion of the total variation in the dependent variable that
is explained by variation in the independent variable
Coefficient of Determination
is often useful to attempt to represent data with the equation of a straight line in
order to predict values that may not be displayed on the plot.
line of best fit
determined by the correlation between the two variables on a scatter plot
line of best fit
is a statistical technique that can show whether and how strongly pairs of variables
are related.
Correlation
If the correlation is greater than 0, then the variables are
positively correlated.
If the correlation is less than 0, then the variables are said to be
negatively correlated
If the correlation is exactly 0, such as for birthweight and birthday, then the variables are said to be
uncorrelated
exists when high scores in one variable are associated with high scores in the second variable or low scores in one variable are associated with low scores in the other
POSITIVE CORRELATION
exists when high scores in one variable are associated with low scores in the second or vice versa.
NEGATIVE CORRELATION
exists when the points on the scatter diagram are spread in a random manner
ZERO CORRELATION
all points lie on a straight line
PERFECT CORRELATION
True or false:
A key thing to remember when working with correlations is never to assume a correlation
means that a change in one variable causes a change in another
True
It seeks to find the relationship between two variables.
Correlation
is commonly used for testing relationships between categorical variables.
Chi Square statistic
The _______ of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent.
null hypothesis
The Chi-Square statistic is most commonly used to evaluate _________ when using a crosstabulation (also known as a bivariate table).
Tests of Independence
________ presents the
distributions of two categorical variables simultaneously, with the intersections of the categories of the variables appearing in the cells of the table.
Crosstabulation
The ___________ assesses whether
an association exists between the two variables by comparing the observed pattern of responses
in the cells to the pattern that would be expected if the variables were truly independent of each
other
Test of Independence
Is student status (in-state versus out-of-state) associated with one’s eventual graduation
outcome (graduating versus not graduating)?
Answer: Chi-Square test of _____ _ ________
Independence
To test a theory that people have no preference among four different outdoor activities,
you ask 100 people to select among jogging, bicycling, hiking, or swimming.
Answer: Chi-Square test of _____ _ ________
Goodness of fit
A biostatistician would like to determine if the ratio of the blood type in the storage for
transfusions should be different in Hawaii from the main land. She collected a sample of
blood types of 10,000 people in Hawaii and that of 100,000 people in the mainland. She
wishes to see if the breakdown of blood types (A, B, AB and 0) is the same for both
populations.
Answer: Chi-Square test of _____ _ ________
Homogeneity
A researcher wants to determine if scoring high or low on an artistic ability test depends
on being right or left-handed.
Answer: Chi-Square test of _____ _ ________
Independence
A national organization wants to compare the distribution of level of highest education
completed (high school, college, masters, doctoral) for Republicans versus Democrats.
Answer: Chi-Square test of _____ _ ________
Goodness of fits
A preservation society has the percentages of five main types of fish in the river from 10
years ago. After noticing an imbalance recently, they add some fish from hatcheries to the
river. How can they determine if they restored the ecosystem from a new sample of fish?
Answer: Chi-Square test of _____ _ ________
Goodness of fit
is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis
ANOVA test
is used to compare two means from two independent (unrelated) groups using
the F-distribution
one way ANOVA
null hypothesis for the test one way ANOVA is that the ______
two means are equal
True or false: one way ANOVA will tell you that at least two groups were different from each other And which groups were different.
False, it won’t tell you which groups were different
If the computed F value is greater than the tabulated F value, then the null hypothesis is
rejected
If the computed F value is less than the tabulated F value, then the null hypothesis is
accepted
is used when the research question involves the comparisons of means from more than two independent groups.
ANOVA
It provides a statistical
test for determining whether there is enough evidence to reject the null hypothesis that all the
means are equal.
ANOVA
It is the probability of the occurrence of a disease or other health outcome of interest during a specified period, usually one year
Risk
is calculated by dividing the number who got the disease during the defined period by the total population of interest during that period.
Risk
is the calculated ratio of incidence rates of a health condition or outcome in two groups of people, those exposed to a factor of interest and those not exposed.
Relative risk
used to determine if exposure to a specific risk factor is associated with an increase, decrease, or no
change in the disease or outcome rate when compared to those without the exposure.
Relative risk
is a statistical measure of the strength of the association between a risk factor and an
outcome.
Relative risk
fundamental comparison of rates using a ratio in epidemiology is known as the
rate ratio
rates being compared are incidence rates, epidemiologists call those comparisons ____
risk ratios
risk ratios is also referred to as
relative risk (RR)
is a measure of association that provided the strength of association between exposure and outcome in a population
relative risk
True or false: Relative risk is not a flexible tool.
False
When the relative risk is above 1, the interpretation is that those in the exposed group are __________ the outcome than those in the nonexposed group
more likely to have
The larger the number, the _______ the relationship between being exposed and having the outcome.
stronger
Relative Risk = 1
Null value; No relationship exists
Relative Risk > 1
Positive association; more likely to have the outcome
Relative Risk < 1
Negative association; less likely to have the outcome
is a measure of association that provides strength of association between exposure and outcome in a population.
RELATIVE RISK
is a measure association that provides the strength and direction of the association between exposure and outcome in a population.
odds ratio
odds ratio greater than 1 indicates a ______ between exposure and outcome
positive association
odds ratio less than 1 indicates a _____ between exposure and outcome.
negative association
odds in those with the outcome to the exposure odds in those without outcome
Exposure
odds in those with exposure to the outcome odds in those without exposure.
Outcome
first way that the odds ratio can be calculated
Exposure Odds Ratio
Formula for exposure OR
𝑎/𝑐
𝑏/𝑑
Formula for Outcome OR
𝑎𝑑/𝑏𝑐
second way that the odds ratio can be calculated
Outcome Odds Ratio
measure of association that provides strength and direction of the association between existing exposure and outcome in the population.
Prevalence Ratio
a measure of association between exposure and outcome, provides strength and direction using two incidence densities
Incidence Density Ratio
a measure association that provides the strength and direction of the association between exposure and outcome in a population.
Odds ratio
is another tool used for testing population mean when the variance is unknown and/or the sample size is small (n < 30).
T-test
is used to test the hypothesis involving the mean of a study.
T-test