Research Design and Statistics Flashcards
[internal validity]
A valid causal inference requires satisfaction
of three criteria: (a) statistical association, (b) temporal precedence, and (c) ________
nonspuriousness
Spurious causes are ….
threats to internal validity
External validity is the extent to which the causal association can be ___________
to or across variations in study instances
generalized
An example of ___________, researchers sometimes use inadequate labels to describe study instances (e.g., label a treatment “progressive relaxation” when the treatment has many additional therapeutic components).
Construct Validity
Randomized experiments are considered the gold standard for assessing ________.
causality
In _________ trials, an intervention’s effects are examined under real-world conditions.
Such trials often take place outside of academic settings (e.g., community
mental health centers).
effectiveness
In ______ trials, an intervention’s effects are examined under ideal circumstances,
particularly with respect to treatment implementation.
efficacy
In _________ analyses, researchers
analyze outcome data from participants as a function of their original group assignment,
regardless of their level of exposure to treatment. The analysis is intended
to provide a conservative (and real-world) estimate of the treatment effect because
it is based on cases exposed to varying levels of treatment.
intent-to-treat
missing data is core problem
single-case experiments are often designed to
increase _________
internal validity
The _______ design is a single-case design that alternates the baseline (A) phase
(intervention absent) with an intervention (B) phase (intervention present). The
outcome of interest is assessed on multiple occasions within each phase.
ABAB
In _________ designs, replication of an effect is sought over multiple baselines,
which can reflect different behaviors, settings, and/or children (just to name
a few).
multiple baseline
Although inferential statistical procedures can be used to analyze data from single case
experiments, it is more common for clinicians to rely on ____________ of the data. Visual inspection is often supplemented with descriptive statistics.
visual inspection
clinicians can examine _______ by comparing the averagenfrequency of the outcome across different phases of the experiment (e.g., during
the A vs. B phases).
mean changes
Clinicians can also examine \_\_\_\_\_\_\_\_ in which they compare the last data point in an immediately prior phase to the first data point in an immediately subsequent phase. If the latency of response is hypothesized to be immediate (e.g., the behavior will reduce dramatically as soon as the intervention is implemented), one might predict dramatic level changes between adjacent (baseline-intervention) phases.
level shifts
Clinicians can also examine _______ (or functional
form) changes by examining the rate of behavior change in different phases. For
example, the behavior might increase in a fairly linear (i.e., constant) manner during
the initial A phase and become fairly stable during the initial B phase
slope
Quasi-experimental studies are experiments that lack _____ of units
to conditions.
random assignment
_____________ (also called passive observational studies) are conducted
when the researcher is not actively manipulating anything (like exposure to an
intervention).
Correlational studies
________ designs compare a group of participants who possess a certain characteristic
(e.g., diagnosis of attention deficit hyperactivity disorder [ADHD]) with
a group of participants who do not possess the characteristic.
Case–control
In \_\_\_\_\_\_\_\_\_\_, an intact group (i.e., cohort) is followed over time to examine the emergence of—and/or change—in some outcome of interest. These designs are classified as longitudinal (also known as prospective) because individuals are assessed on at least two occasions.
cohort designs
If the multiple cohorts also differ in their age or some other salient developmental marker at the study’s inception, the study is called a __________
design. Such designs allow for the study of a longer developmental period over fewer
years of data collection because several developmental cohorts (e.g., toddlers, preschoolers,
and school-aged children) are embedded in the study.
cross-sequential
______ is a threat to validity when naturally occurring changes are mistaken
for an intervention effect—when symptoms remit because of the passage of time
rather than the effects of an intervention.
Maturation
____ is a threat to validity when some event (or constellation of events) occurs
during the study and impacts the results in a manner mistaken for an intervention
(e.g., pt exercising and it helps depression vs. treatment).
History
____—also known as regression to the mean—occurs when
extreme scores tend to revert back to the mean on a subsequent evaluation.
Statistical
regression is more plausible in single-group studies in which extreme
performers (e.g., severely depressed individuals) comprise the study sample.
Statistical regression
______ is a threat to validity when the pattern of participant drop-out impacts
the results in a way interpreted as an intervention effect.
Attrition
____ is a threat to validity when exposing individuals to the pretest changes
them in ways that might be mistaken for an intervention effect.
Testing
______ is a threat to validity when the measurement tool changes and
impacts the results in a manner mistaken for an intervention effect.
Instrumentation
____ occurs in multiple-group studies when systematic differences among
intervention groups can be mistaken for an intervention effect.
Selection
As such, the reliability of a measure is viewed as
the ratio of true score variance to ______ (True score variance is consistent
with consistency or dependability, concepts that are often invoked in discussions
of reliability.)
total variance.
Kuder–Richardson Formula 20—often
abbreviated KR-20—which can be used when the items are
dichotomous
______—also known as internal structure—is the extent to which the
structure of the measure is consistent with the theorized factor structure of the construct.
Structural validity
____ matrix can be used to evaluate convergent and discriminant validity.
multitrait–multimethod
In ____, three
primary decisions involve (a) choosing a method of factor extraction, (b) choosing
a method of factor rotation, and (c) deciding on the number of factors to retain.
EFA
In _______ rotations, the factors are
assumed to be uncorrelated. In oblique rotations, the factors are assumed to be
correlated—often the case in psychology.
orthogonal
Several _______ can help with this
last decision—including the chi-square test, root mean square error of approximation
(RMSEA), and standardized root mean square residual (SRMR). All fit indices
quantify (albeit in slightly different ways) how well the model-implied covariance
matrix reproduces the estimated population covariance matrix of the analysis
variables.
fit indices [CFA]
One of the advantages of conducting
statistical analyses on_______ (as in the case of structural equation
modeling, described below), is the gain in statistical power that results when
measurement error is removed from the constructs of interest.
latent variables
The ___ is used when (a) data are ordinal, or (b) when data are interval or ratio, but the
distribution is highly skewed (the median is less affected by skewness)
median
In_______, the mean and median are identical.
In symmetrical
distributions with a single mode, the mean, median, and mode are identical
symmetrical distributions
The interquartile range captures the middle 50% of the distribution and is
computed by subtracting the 25th percentile (first quartile) from the _____th percentile
(third quartile)
75
The standard deviation (SD) captures the average distance of scores from the
mean
z = (x − M) / _______ .
SD
Properties of the Normal z Distribution When a z-score conversion is used,
the resulting z distribution has the following properties: (a) the mean is ___; (b) the
standard deviation is 1; and (c) each z score represents the position of the score in
relation to the mean, in standard deviation units. In other words, a z score of −1.27
denotes a score exactly 1.27 standard deviations below the mean.
0
percentiles can be computed by conversion of T to z: (T – 50)/______.
10
__________: (a) make
more distributional assumptions (e.g., that the distribution is normal), (b) assume
data are measured on an interval or ratio scale, (c) are conducted on actual data
(as opposed to on ranks derived from data), and (d) allow researchers to test more
specific hypotheses about the populations from which they are drawn.
parametric statistics
the null hypothesis specifies
that the two ________ are equal (mt = mc).
population means
When the sample data
would occur relatively infrequently assuming the null hypothesis (e.g., the data
would occur less than 5% of the time if the null hypothesis were true
p < .05 (sig level; alpha)
results are not declared statistically significant even though the null hypothesis
is false
Type II error (Beta)
Results were excepted even though the null was true
Type I error (alpha)
_______ is the probability of correctly rejecting a false null hypothesis (i.e., finding an effect when one exists in the population)
Statistical power (1 -Beta)
What helps increase power?
Increase N; Increase alpha; Directional hypotheses; Large effects; More reliable measures
When the 95% confidence interval does not contain zero the results are:
significant
An example of \_\_\_\_\_\_\_\_\_: (a) no longer meeting diagnostic criteria and (b) scoring two standard errors below a pretest score on one of the primary study outcomes.
Clinical significance (vs statistical)
two-tailed or nondirectional means:
(leaving open the possibility that the sample mean will be larger or smaller than
the population mean of 100).
A Single-sample Z test is used when
The population SD is known (e.g., intelligence tests).
A single-sample T test is used when
The population SD is unknown.
A relevant effect size is C______, which is a standardized mean difference and is
computed as the ratio of the difference between the two sample means to the
pooled standard deviation.
Cohen’s d (.2 small, .5 med, .8 large).
omega squared, ω2—reflect the proportion of variance
in the outcome that is explained by the factors. Interpretive guidelines for omega
squared are small = 0.01, medium = 0.06, and large = 0.15.
Effect size used in ANOVA
In models for which
there are three or more levels of a factor (e.g., low, medium, and high levels of
stress), the test of the factor’s main effect is an _______ statistical test.
omnibus
omnibus tests are typically followed
by a series of additional tests (e.g., comparing each pairs of means). These
more focused contrasts are often referred to as _____.
post hoc tests
Conducting multiple
statistical tests raises the familywise type I error rate associated with the full set of
analyses, so we use ______.
Corrections (e.g., Bonferoni, Tukey)
A ____________ can be used
to contrast two or more treatment groups (e.g., CBT, interpersonal therapy [IPT],
and control).
one-way between-subjects ANOVA
A ___________ can be used to
examine a single cohort’s symptom levels over two or more assessments (e.g., pretest,
posttest, and follow-up measures for individuals exposed to a single intervention).
one-way within-subjects ANOVA
What test design is this an example of:
one might be interested in examining whether ADHD diagnosis (present or absent)
and testing environment (quiet or noisy room) have an impact on performance.
two-way between subjects ANOVA
What test design is this an example of:
exposing a group of children with ADHD to two levels of a psychostimulant drug dose
factor (e.g., 5 and 10 mg) crossed with two levels of a testing-environment
factor (e.g., quiet and noisy rooms). In other words, all children would be observed
under all four study conditions (e.g., 10 mg, quiet room) and performance would
be the dependent variable
two-way WITHIN subjects ANOVA
\_\_\_\_\_\_\_\_\_ are used in two primary manners: (a) to increase statistical power in randomized experiments (when the covariate is uncorrelated with intervention conditions, but correlated with the dependent variable); and (b) to control for possible confounding influences (i.e., controlling for variables associated with both intervention conditions and the dependent variable) in nonrandomized designs.
ANCOVAs
the multivariate ANOVA model (i.e.,
MANOVA model) allows for _______ to be analyzed in a single model.
multiple dependent variables
In MANOVA, the actual analysis is performed on an optimized _________ (one that maximizes between group
differences while minimizing within-group differences). A number of test
statistics are generated by MANOVA (Pillais’s trace, Wilk’s lambda, Hotelling’s
trace, and Roy’s largest root).
linear combination of the multiple dependent variables
Effect size is estimated by computing the square of the correlation (i.e., the coefficient
of determination), which is the proportion of shared variance between the
two variables. Common interpretative guidelines for r2 are as follows: small = .01,
medium = .09, and large = .25.
Correlation effect size (r2)
Spearman’s
rank correlation coefficient and Kendall’s tau coefficient are both nonparametric
tests that are used when responses on the two variables are ________.
rank ordered
unstandardized linear regression coefficient (b) meaning):
for every one-unit increase in x, Y changes by b units
(Hierarchical regression analysis is sometimes confused with stepwise regression analysis,
which is an _______ approach to predictor entry used more often in exploratory
analyses.)
atheoretical
Models that
do not include interaction effects are referred to as _______.
additive effects models
The Mann–Whitney Test (also called the Mann–Whitney–Wilcoxon Test) is a nonparametric
alternative to the _________
independent samples t-test
The Kruskal–
Wallis Test is a nonparametric alternative to the ______.
between-group ANOVA
The Wilcoxon Signed Ranks Test is a nonparametric
alternative to the __________.
paired samples t-test.
Testing for moderation is the same as testing for a _________
between a predictor and a moderator. For example, a researcher might test
whether an intervention effect is moderated by participant sex to see whether the
effects of the intervention on the outcome are stronger for men or women
statistical interaction
A _______ is the mechanism through which a distal predictor operates in influencing an
outcome.
mediator
________ centers on understanding participants’ lived experiences and
emphasizes subjective experience (e.g., understanding personal knowledge, motivations,
and perspectives).
Phenomenology
The ultimate goal of ______ is to develop a theory (“grounded” in data)
about a concept of interest. This approach is used when current theory is lacking,
nonexistent, or incomplete.
grounded theory
qualitative research data are often analyzed
using __________, a process of identifying and analyzing patterns or themes
within data. This can occur either deductively (i.e., starting from a particular theory
or hypothesis) or inductively (i.e., as in grounded theory above)
thematic analysis
1) triangulation (the use
of multiple, varied sources of data, methods, and researchers in order to corroborate
results), 2) audits (the use of an external consultant to complete an independent
analysis), and 3)member checking (having participants in the study review and provide
feedback on the credibility of findings).
Checking reliability and validity in qualitative research
The first step in ________ is to identify and engage stakeholders (e.g., administrators,
staff, clients, etc.). Next, a needs assessment might be conducted to assess
the relative priority of the needs, or “problems,” of a specific population in order
to determine where resources should be allocated
Program Evaluation
Formative evaluations provide information to make needed changes early on, whereas _____ determine a program’s success once delivered.
summative evaluations
CBA examines the balance of resources/
costs spent on a program compared to the benefits to answer the question “have
resources been well spent on this program?” This results in a benefit/cost ratio, the
worth of a program’s outcomes divided by the program’s costs, which can then be
compared to alternative programs. CBA is controversial in part because it assigns
monetary values to the benefits arising from a program.
Cost-benefit Analysis
_______ sampling is when researchers collect data from individuals with specific
characteristics.
Purposive
_______ sampling, a type of purposive sampling, involves participants
inviting others to participate in the study.
Snowball
______ sampling uses
incentives to overcome possible biases that result from snowball and other chain-referral
sampling methods.
Respondent-driven
In other words, unlike _______ variables which provide a causal explanation for the relationship between variables, moderator variables affect the strength of the relationship.
mediator
_________ is useful for studying behaviors that occur infrequently, have a long duration, or leave a permanent record or other product (e.g., a completed worksheet or test).
Event sampling
_______ is an alternative to behavioral sampling and is used when a goal of the study is to observe a behavior in a number of settings. It helps increase the generalizability of a study’s findings.
Situational sampling
the independent variable (experimental variance);
• systematic error (error due to extraneous variables); and
• random error (error due to random fluctuations in subjects, experimental conditions, methods of measurement, etc.).
three factors that can cause variability in the study’s dependent variable.
when a researcher includes an extraneous variable as an independent variable in a study, the extraneous variable is also known as a ________ variable
moderator
________ is a particularly useful method in quasi-experimental research in which subjects cannot be randomly assigned to treatment groups.
Statistical control (such as using covariates)
A study has internal validity when it allows an investigator to determine if there is a ____ relationship between independent and dependent variables.
causal
Fatigue, boredom, hunger, and physical and cognitive development are
potential ______ effects that can limit a study’s Internal validity
maturational;
The best way to control maturation is to include more than one group in the study and randomly assign subjects to groups.
History is controlled by including more than one group in the study and _________ subjects to groups.
randomly assigning
The threat of _______ can be controlled by administering the DV measure only once as a posttest, by designing the measure in a way that minimizes memory and practice effects, or by including at least two groups in the study with all groups completing the pre- and posttests so that any difference between groups on the posttest
Testing
_______ is controlled by including more than one group in the study and ensuring that all groups are subject to the same instrumentation effects, by using the same measuring devices and procedures with all subjects, and by making sure that measuring devices and procedures do not change during the course of the study.
Instrumentation
The _________ threat is avoided by not including only extreme scorers in the study or by including more than one group and ensuring that all groups consist of subjects who are similarly extreme.
statistical regression
______ is difficult to control, but pretesting can help determine if dropouts and non-dropouts differ with regard to their initial status on the DV.
Attrition
Population validity is generalization to other people and ecological validity is generalization to other ______.
settings
When a study’s results have been contaminated by _______, they cannot be generalized to people who have not been pretested. Pretest · sensitization is controlled by not administering a pretest or by using the Solomon four-group design, which allows an investigator to measure the impact of pretesting on both the external and internal validity of a research study.
pretest sensitization
An interaction between _________ is often a problem when subjects are volunteers because volunteers tend to be more motivated than non-volunteers and, consequently, might be more responsive to the IV. In this situation, the study’s results
apply to volunteers but can’t be generalized to other people. The best way to eliminate this threat is to ensure that the sample is representative of the population of interest.
selection and treatment
Research participants may respond to an independent variable in a particular way simply because they know their behavior is being observed, and this is known as _____
reactivity
The behavior of subjects can also be altered by __________, which are cues in the experimental setting that inform subjects of the purpose of the study or suggest what behaviors are expected of them.
demand characteristics
______can be controlled by using deception, unobtrusive (nonreactive) measures, or a single- or double-blind technique. When using a single-blind technique, subjects do not know which treatment group they have been assigned to; in a double-blind study, neither the subjects nor the experimenter know which group subjects have been assigned to.
Reactivity
When a study involves exposing each subject to two or more levels of an independent variable (i.e., when the study utilizes a within-subjects design), the effects of one level of the independent variable can be affected by previous exposure to another level.
(Order Effects, Carryover Effects)
The _______ designs are considered inappropriate when withdrawal of a treatment during the course of a research study would be unethical (e.g., when the treatment has successfully eliminated a self-injurious behavior)
reversal (ABAB)
Because of its insensitivity to “outliers,·· the _____ is a useful measure of central tendency when a distribution contains one or a few extreme scores.
median
One advantage of the _____ is that, of the three measures of central tendency, it is least susceptible to sampling fluctuations.
mean
Regardless of the shape of the distribution of individual scores in the population, as the sample size increases, the sampling distribution of the mean approaches a normal distribution.
• The mean of the sampling distribution of the mean is equal to the population mean.
• The standard deviation of the sampling distribution of the mean is equal to the population standard deviation divided by the square root of the sample size:
Central Limit theorem
The ________ is the foundation of inferential statistics. It is the sampling distribution that enables a researcher to make inferences about the relationship between variables in the population based on obtained sample data.
sampling distribution
A _____ error is more likely when alpha is low, the sample size is small, and the independent variable is not administered in sufficient intensity.
Type II
One tailed tests, when appropriate, and parametric tests vs non are helpful to increase _____.
Power
The most effective way to maximize the robustness of a parametric test is to have an equal number of _______.
subjects in each group
Numerator and denominator of F statistic are:
Mean Square Between / Mean Square Within
ANCOVA and randomized block ANOVA serve to decrease ___________ variability in order to create a stronger test.
within group
Put another way, the ___________ indicates the proportion of variability in Y that is
explained by, or accounted for by the variability in X. For example, if the correlation
coefficient for sales success and product knowledge is .60, then 36% (.60 squared= .36) of variability in sales success is accounted for by product knowledge.
squared correlation coefficient
If the subscript contains two different letters or numbers (e.g., “xy”), it represents the correlation between two different variables. When the subscript contains the same letters or numbers (e.g., ‘‘xx”), it is a _________.
reliability coefficient
Canonical correlation is an extension of multiple regression that is used when two or more continuous predictors are to be used to predict status on _______.
two or more continuous criteria
________ analysis is also known as discriminant analysis and is the appropriate technique when two or more continuous predictors will be used to predict or estimate a person’s status on a single discrete (nominal) criterion.
Discriminant function; examines “hit rate” or number of correct classifications