Research Design & Statistics Flashcards
aim of science
discover systematic explanations for and/or rules governing natural phenomena
research
conduct systematic investigations and inquiries into the phenomenon (or phenomena) in question
research design
plan that specifies the research strategy — how subjects will be selected, how variables will be defined and measured, the conditions under which the research will be conducted, etc.
basic sequence of a scientific inquiry
1) hypothesis (or proposition) regarding the relationship between 2+ variables, is formulated
2) hypothesis is operationally defined (specify what exactly we should observe if the hypothesis is true)
3) collect and analyze data to test the hypothesis
variable
simply anything that varies;
not consistent or having a fixed pattern; liable to change
constant
something that does not vary;
factors that do not change during the experiment
independent variable (IV)
input variable — the event or treatment manipulated by the researcher
other names for IV
the treatment variable or experimental variable
dependent variable (DV)
is the outcome variable;
what is hypothesized to change as a result of manipulations of the independent variable;
measured to determine if they change as the result of the experimental manipulations
correlational research
investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them;
variables are measured not manipulated;
finding an association, not causation;
can be used to predict status on another variable
predictor variable
variable that is suspected to predict or correlate with an outcome variable
criterion variable
the outcome, result, or effect that researchers try to predict or explain in a study
levels
when applied to a variable, refers to the values it could take
factor design
statistical method used in experimental research that helps you study the effects of multiple factors simultaneously;
each level of one independent variable is combined with each level of the others to produce all possible combinations
internal validity
possible to determine whether a causal relationship exists between the IV and DV;
reasonably sure that the IV, rather than an extraneous (irrelevant) variable, is causally responsible for any observed change in the DV
one-group, pretest/post-test design
the dependent variable is measured once before the treatment is implemented and once after it is implemented;
subjects in one group are measured before and after they receive a treatment;
poor internal validity
extraneous variable
any variable not being investigated that has the potential to affect the outcome of a research study;
any factor not considered an independent variable that can affect the dependent variables or controlled conditions
confounded
experiment that is contaminated by an extraneous variable
equivalence
ensure that all the groups involved in a study are equivalent in every respect, except for their status on the IV
Threats to Internal Validity
history, maturation, testing, instrumentation, statistical regression, selection, differential mortality, experimental bias
TISSDEMH
history
any external event, besides the experimental treatment, that affects scores or status on the dependent variable
maturation
any internal (biological or psychological) change that occurs in the subjects while the experiment is in progress and exerts a systematic effect on the DV;
fatigue, boredom, hunger, physical or intellectual development
testing
testing is always a threat to internal validity in the one-group pretest/post-test design;
when the pretest and post-test are similar, subjects may show improvement on the post-test simply from their experience with the pretest
instrumentation
when the nature of the measuring instrument has changed;
raters’ assessment abilities have improved over time;
one way to control for this threat is to use highly reliable (dependable and consistent) measuring instruments
statistical regression
tendency of extreme (very high or very low) scores to fall closer to the mean (average) upon re-testing;
6YO child scored 180 IQ, will likely have a lower score 3 years later;
can threaten internal validity whenever extreme scorers are used as research subjects (v depressed individuals)
selection
pre-existing subject factors that account for scores on a DV
motivation, intelligence, self-esteem, etc.
differential mortality
when people who drop-out of one of the groups differ in systematic ways from people who remain in the study;
when a study involves 2+ groups
experimenter bias
behavior of subjects changes as a result of experimenter expectancies, rather than as a result of the independent variable;
ex: researcher may unconsciously communicate expectations to the subjects; researcher, consciously or unconsciously, makes errors in the direction of the research hypothesis when scoring or reporting the results
Rosenthal and Jacobson (1968) “Pygmalion in the Classroom”
teacher’s preconceived notions of a student’s ability resulted in the student’s grades and even IQ scores moving in the expected direction, even though the students themselves hadn’t changed
experimenter expectancy
AKA Rosenthal effect and the Pygmalion effect;
how the perceived expectations of an observer can influence the people being observed
how to overcome experimenter bias effects
using the “double-blind” technique, in which neither the subjects nor the experimenter know which group (experimental or control) subjects have been assigned to
double-blind technique
neither the subjects nor the experimenter know which group (experimental or control) subjects have been assigned to
random assignment (or randomization)
for all subjects in the experiment, the probability of being assigned to a particular group is the same;
considered the most “powerful” method for controlling extraneous variables;
all extraneous characteristics (including ones the researcher has not measured or even thought of) should be distributed to the groups equally
Random Assignment vs. Random Selection
random selection: method of selecting subjects into a research study; all members of the population under study have an equal chance of being selected to participate in the research
random assignment: something that takes place after the subjects have been selected; the probability of subjects who have already been selected being assigned to each group is the same
matching
identifying subjects (through a pretest) who are similar in terms of their status on the extraneous variable, then grouping similar subjects and randomly assigning members of the matched group to the treatment groups
when is matching useful
when the sample size is small;
random assignment cannot be counted on to ensure equivalency among the groups in term of the extraneous variable
blocking
studying the effects of an extraneous variable (a pre-existing subject characteristic) to determine if and to what degree it is accounting for scores on the DV;
making the extraneous variable another IV
matching vs. blocking
matching: ensure equivalency in terms of the extraneous variable; doesn’t add an IV
blocking: determine the effects of the extraneous variable; add a new IV and, therefore, add additional experimental groups
Holding the Extraneous Variable Constant
including only subjects who are homogenous in terms of their status on the extraneous variable;
completely eliminates the effects of an extraneous variable;
con: cannot be generalized to populations that are not sampled
analysis of covariance (ANCOVA)
statistical strategy for increasing internal validity;
after the data are obtained, DV scores are adjusted so that subjects are equalized in terms of their status on one or more extraneous variables
external validity
the generalizability of the results of a research study to other settings, times, or people
interaction
some variable has one effect under one set of circumstances, but a different effect under another set of circumstances;
term implies that a given effect is not generalizable; that is, it doesn’t work the same way under all circumstances
interaction between selection and treatment
effects of a given treatment would not generalize to other members of the population of interest (or target population)
Interaction Between History and Treatment
effects of a treatment do not generalize beyond the setting and/or time period in which the experiment was done
Interaction Between Testing and Treatment
results of research in which pretests are used might not generalize to cases in which pretests are not used
pretest sensitization
effect in which the administration of a pretest affects the subsequent responses of a participant to experimental treatments
demand characteristics
cues in the research setting that allow subjects to guess the research hypothesis
The Hawthorne effect
tendency of subjects to behave differently due to the mere fact they are participating in research
Order Effects (AKA Carryover Effects and Multiple Treatment Interference)
when participants’ responses in the various conditions are affected by the order of conditions to which they were exposed;
effect of being tested in one condition on participants’ behavior in later conditions
repeated measures design
studies in which the same subjects are exposed to more than one treatment
random selection
all members of the population under study have an equal chance of being selected to participate in the research
stratified random sampling
taking a random sample from each of several subgroups of the total target population;
purpose is to ensure proportionate representation of the defined population subgroups
cluster sampling
unit of sampling is a naturally occurring group of individuals, rather than the individual
multistage cluster sampling
taking of samples in stages using smaller and smaller sampling units at each stage
naturalistic observation
behavior is observed and recorded in its natural setting or in a setting as similar to the natural one as possible;
lacks internal validity
analogue research
results of lab research studies are used to draw conclusions about a real-world phenomenon;
the researchers made analogies about real-world phenomena based on studies involving contrived, laboratory situations ;
good internal, bad external validity
single-blind study
subjects are not informed of the purpose of the study and do not know which treatment they have been assigned to
counterbalancing
different subjects or groups of subjects receive the treatments in a different order;
to control for order effects
latin square design
ordering the administration of treatments so that each appears once and only once in every position
true experiment
investigator randomly assigns subjects to different groups, which receive different levels of a manipulated variable;
greatest internal validity
quasi-experimental designs
used when random assignment of subjects to groups is not possible;
involves the use of intact groups, rather than groups that are constructed on the basis of random assignment
developmental studies
assessing variables as a function of time (e.g., physical and psychological development)
3 types of developmental designs
longitudinal, cross-sectional, and cross-sequential
longitudinal study
same people are studied over a long period of time
cons of longitudinal studies
high cost (time and money); high subject dropout rates; and, in studies that involve assessing performance on a task, practice effects
why longitudinal designs tend to underestimate true age-related change
1) subjects who drop out of longitudinal designs tend to be those who are less able on the task studied, leaving the remaining subjects will be relatively high in ability, and the data will show a misleadingly low level of age-related decline;
2) practice effects can facilitate performance on the dependent variable
cross-sectional design
different groups of subjects, divided by age, are assessed at the same time;
tend to overestimate true age-related declines in performance
cohort effects (AKA intergenerational effects)
observed differences between different age groups may have to do with experience rather than age
cross-sequential design
representative samples of different age groups are assessed on two or more occasions
time-series design
taking multiple measurements over time (usually multiple pretest and post-test measures) to assess the effects of an IV;
the series of measurements on the DV is interrupted by the administration of a treatment
advantage of multiple measurements
allow one to rule out many threats to internal validity, such as maturation, regression, and testing;
biggest threat is history
two-group time-series design
take the same measurements from a comparison “control” group that is comparable to the one studied
Single-Subject Designs
number of subjects is one;
well-suited to research on behavior modification since the researcher is able to analyze the behavior before and during treatment - DV is measured several times during both phases
types of single subject designs
“AB” design, “reversal” design, and “multiple baseline” design
AB design
involves a single baseline phase and a single treatment phase;
con: easy for any observed change in behavior in the treatment phase to be due to a historical event or other extraneous factor
Reversal (or Withdrawal) Design
treatment is withdrawn and data are collected to determine if the behavior returns to its original level upon this withdrawal;
ABAB design, in which the treatment is re-applied after the second baseline phase
Multiple-baseline designs
when cannot use reversal design;
applying the treatment sequentially (across different baselines);
treatment may be applied sequentially across different behaviors of the same subject (multiple baseline across behaviors), to the same subject in different settings (multiple baseline across settings), or to the same behavior of different subjects (multiple baseline across subjects)
qualitative or descriptive research
type of research in which the investigator doesn’t start with a theory; theory is developed from the data rather than derived a priori (beforehand)
qualitative methods of research
participant observation, nonparticipant observation, interviews, surveys, case studies
surveys
used in areas such as attitude measurement, consumer preferences, and worker satisfaction studies;
3 basic techniques - personal interviews, telephone surveys, and mail surveys
case study
detailed examination of a single case (single individual, group, or phenomenon);
based on the assumption that the case under study can be viewed as an example of a more general class;
from an experimental POV, case studies don’t allow one to conclude the nature of relationships between variables (lack internal validity), and their results may not be generalizable to other cases (may lack external validity
protocol analysis
loosely applies to research involving the collection and analysis of verbatim reports;
subject is asked to think aloud as he or she is performing a task while the researcher records everything the subject says (this record is referred to as a protocol);
researcher analyzes the data in an attempt to identify cognitive processes involved in performing the task;
analysis is based on the researcher’s interpretation of the verbal protocol
statistics
methods of measuring variables and organizing and analyzing data
descriptive statistics
describe a set of data collected from a sample
inferential methods
used to make inferences about an entire population on the basis of sample data
nominal
divides a variable into unordered categories into which the data may fall;
qualitative data that groups variables into categories that do not overlap;
categories are not ordered;
“sex,” “diagnostic category,” “hair color”
ordinal
variables have natural, ordered categories and the distances between the categories are not known;
Category 1 has less (or more) of the given attribute than Category 2;
ranks, satisfactory ratings, education
interval
numbers are scaled at equal distances, but the scale itself has no absolute zero point;
measured along a numerical scale that has equal distances (intervals) between adjacent values;
can add and subtract but can’t multiply or divide;
IQ, temperature
ratio
identical to interval scales, except they have an absolute zero point;
multiplication and division require a ratio scale;
money, distance, time
3 types of descriptive statistics
frequency distributions, measures of central tendency, measures of variability
frequency distribution
provides a summary of a set of data;
indicates the number (frequency) of cases that fall at a given category or score or within a given score range;
can be graphically displayed on a table, polygon, bar graph (histogram)
cumulative frequency (cf)
total number of observations that fall at or below the given category or score
histogram
scores are plotted on the x-axis (or abscissa), and frequency of occurrence of each score is plotted on the y-axis (or ordinate)
normal distribution
data are symmetrically distributed with no skew;
most values cluster around a central region, with values tapering off as they go further away from the center
negatively skewed
larger proportion of the scores falls toward the high end of the scale and relatively few scores fall toward the low end of the range of scores;
has a long tail on the left (the negative end of the distribution) and “lump” of scores on the right;
negatively skewed = easy test
positively skewed
larger number of scores at the low end of the scale (to the left side of the range of scores) and a long tail to the right (the positive end);
positively skewed = difficult test
mean
arithmetic average;
most useful measure of central tendency;
very sensitive to extreme values
median (Md)
middle value of the data when ordered from the lowest to the highest;
more useful measure of central tendency when a distribution is skewed
mode
most frequent value in a collection of numbers
multimodal
distribution with multiple modes
bimodal distribution
distribution with two modes
Relationship Between the Mean, the Median, and the Mode
normal distribution: 3 measures are equal;
positively skewed distribution: mean > median > mode;
negatively skewed distribution: mode > median > mean
variability
dispersion;
how spread-out scores are
3 most commonly used measures of variation
range, variance, and standard deviation
range
difference between the highest and lowest scores in a set;
general description of a distribution’s variability
cons of range
affected by outliers;
no info about the distribution of the scores across the range (how bunched up or variable scores are around the mean)
variance
average of the squared differences of each observation from the mean;
measure of how the scores disperse around the mean;
equal to the square of the standard deviation
standard deviation
the square root of the variance;
the expected deviation from the mean of a score chosen at random;
can be used to calculate the percentage of scores that will fall within a given range, as well as the percentage that will fall above or below a given cutoff score
transformed score
allow an individual raw score to be compared to scores in the rest of the distribution
4 types of transformed scores
z-scores, T-scores, stanines, and percentile ranks
z-scores
AKA standard scores;
raw scores stated in standard deviation terms;
measure of how many standard deviations a given raw score is from the mean
linear transformation
transformation of scores in which the distribution’s shape does not change;
conversion of raw scores to z-scores (and vice-versa) is a linear transformation
t-score
based on 10-point intervals with T = 50 being the distribution’s mean and every 10 points above or below 50 equivalent to a standard deviation away from the mean
stanine scores
divide the distribution into 9 equal intervals, with stanine 1 being the lowest ninth of the distribution and stanine 9 being the highest ninth;
mean = 5, SD of about 2
percentile rank
the percentage of individuals in the standardized group scoring below the individual’s attained raw score;
have a flat (or rectangular) distribution - within a given range of percentile ranks, there will always be the same number of scores
difference between percentage score and percentile rank
percentage score is referenced to items on the test;
percentile rank is referenced to other scores in the distribution
nonlinear transformation
transformation that results in a change of the distribution’s shape
memorize the following points about the standard deviation curve
1) In a normal distribution, about 68% of all scores fall between -1.0z and +1.0z.
2) In a normal distribution, about 95% of all scores fall between z-scores of -2.0 and +2.0.
3) In a normal distribution, the z-score of +1.0 is equivalent to a percentile rank (PR) of 84 and is therefore the cutoff point for the top 16%. Conversely, the z-score -1.0 is equivalent to a PR of 16 and is therefore the cutoff point for the bottom 16%.
4) In a normal distribution, the z-score of +2.0 is approximately equivalent to the 98th percentile and is therefore the cutoff point for about the top 2%. Conversely, the z-score -2.0 is approximately equivalent to a PR of 2 and is therefore the cutoff point for the bottom 2%.
why in a normal distribution, there is a far greater range of percentile ranks contained in the middle of the distribution than at either extreme
a change in an individual’s raw score in the middle of the distribution results in a much greater change in his or her percentile rank than the same raw score change at the distribution’s extreme
sampling error
the inaccuracy of a sample value;
the difference between a sample value (a statistic) and the corresponding population value (a parameter)
sample mean
an estimate of a population mean
error of the mean
difference between a sample mean and the population mean
standard error of the mean
refers to the expected error of a given sample mean;
indicates the extent to which a sample mean can be expected to deviate from its corresponding population mean
standard error of the mean formula
SEmean = s.d. / √N
why as the sample size increases, the standard error of the mean becomes smaller
the larger the sample size, the more we approximate the size of the population, and thus, the closer the sample statistic will be to the population parameter
purpose of statistical hypothesis testing
quantitatively test a research hypothesis
null hypothesis
hypothesis of no difference;
the IV does not have an effect on the DV;
implies that the sample means are drawn from the same population
alternative hypothesis
that the IV does have an effect on the DV, or that the means of the populations of interest on the DV are not equal;
sample means are sufficiently different to conclude that they come from different populations
two-tailed hypothesis
a mean (or means) is different from another mean (or other means), but we do not know in which direction;
psychotherapy will change the scores
one-tailed hypothesis
a mean (or means) is either greater than or less than another mean (or other means);
psychotherapy will improve scores
power
probability of rejecting the null hypothesis when, in fact, it is false;
probability of making a correct decision (reject the null hypothesis) when the null hypothesis is false;
probability of declaring that there is a difference when one really exists;
probability of NOT making a Type II error
Type I error
when the null hypothesis is rejected when it is true;
concludes that a difference exists when it really doesn’t;
“thinking you have something when you really don’t”
Type II error
failure to reject the null hypothesis when it is false;
stating that we don’t have a sufficient difference to reject the null hypothesis, when in fact a real difference does exist;
“thinking you don’t have something when you really do”
alpha level
level of significance;
probability of making a Type I error is equivalent to the alpha level - researchers determine in advance - .01 or .05 level;
if results indicate that there’s only a 5% or less chance of the null hypothesis being true, then the researcher will reject the null - conclude that it is false
retention region
area of a graph where you would accept the null hypothesis if your test results fall into that area;
if results fall into that area then they are NOT statistically significant
rejection region
AKA critical region;
area of a graph where you would reject the null hypothesis if your test results fall into that area;
if results fall into that area then they are statistically significant
significance level
the probability at which we reject the null hypothesis as being true;
when the results of a statistical test are significant at the predetermined alpha level, the null hypothesis is rejected
beta (β)
probability of making a Type II error
factors that affect power
1) Sample Size: The larger the sample size, the greater the power.
2) Alpha: As the pre-set alpha level increases, power increases - higher the alpha, easier to reject null hypothesis.
3) Directional and Non-Directional Statistical Tests: One-tailed (directional) tests are more powerful than two-tailed (nondirectional) tests.
4) Magnitude of the Population Difference: The greater the difference between the population means under study, the more likely the researcher will be able to detect these differences (the more power).
parametric tests
used for interval and ratio data;
t-test and ANOVA
Parametric tests are based on the following assumptions
1) Normal Distribution: assumes the DV values are normally distributed in the population.
2) Homogeneity of Variance: variance of all groups is equal. This is referred to as the homogeneity of variance assumption.
3) Independence of Observations: scores within the same sample or group should not be correlated with each other.
nonparametric tests
test hypotheses based on DVs that are measured on an ordinal or nominal scale;
chi-square and Mann-Whitney U
unbiased sample
sample that is representative of the population
critical value
the value of the test statistic which defines the upper and lower bounds of a confidence interval, or which defines the threshold of statistical significance in a statistical test
factors that impact chosen critical value
1) the pre-set alpha level (e.g., .01 or .05)
2) the degrees of freedom for the statistical test
types of parametric tests
t-test, one-way ANOVA, factorial ANOVA, and MANOVA
t-test
used to test hypotheses about two different means
3 types of t-tests
the one sample t-test, the t-test for independent samples, and the t-test for correlated samples
t ratio
used for t-tests;
when statistically significant, this indicates that the two means are significantly different and the null hypothesis is therefore rejected
One Sample t-Test
designed to compare the mean of a single sample to a known population mean;
df = N-1
t-Test for Independent Samples
used to compare two means derived from independent (unrelated) samples;
df = N-2
t-Test for Correlated Samples
performed when the samples consist of matched pairs of similar units, or when there are cases of repeated measures;
matched samples or pretest-posttest;
df - N-1, where N = number of pairs of scores
One-Way Analysis of Variance (ANOVA)
study with one independent variable where the means of more than two groups are compared;
What is the probability that these means are from the same population?
F ratio
if the value of F is statistically significant, then the means are significantly different and the null hypothesis is rejected;
represents a comparison between two estimates of variance
between-group variance (or treatment variance)
the degree to which the groups as a whole differ from one another
within-group variance (or error variance)
the degree to which subjects within experimental groups differ from each other
ANOVA statistic is a fraction
Variance Between Groups/Variance Within Groups;
if the top term is large and the bottom term is small, we have a large ANOVA statistic, and that’s good - indicates that our treatment had an effect - the difference between the group mean scores is too large to be accounted for by “error” or by the individual differences that will be found even within the same population
sum of squares
measure of the variability of a set of data
df between (dfb)
k-1, where k equals the number of groups
df within (dfw)
N-k, where N equals the total number of subjects
mean square
statistical measure used to estimate between- and within-group variance
Mean Square Between (MSB)
Sum of Squares between divided by the df between
Mean Square Within (MSW)
Sum of Squares within divided by the df within
f-ratio formula
F = MSB / MSW;
mean square between / mean square within
post-hoc tests
indicate exactly which means significantly differ from each other in ANOVA;
involves making pairwise or complex comparisons between means
pairwise comparison
comparison between two means
con of multiple comparisons
the more comparisons that are done, the higher the probability that at least one Type I error (incorrect rejection of the null) will be made
you should know the following about the specific post-hoc tests
1) Of all the post-hoc tests, the Scheffe is the most conservative - provides the greatest protection against the inflation in the Type I error rate that occurs when multiple comparisons are made (but that in turn increases the risk of Type II)
2) If conducting pairwise comparisons, the Tukey is the appropriate post-hoc test
factorial ANOVA
used when a study has 2 or more independent variables;
permits the assessment of both main effects and interaction effects
MANOVA
used when a study has two or more dependent variables
main effect
the effect of one independent variable by itself without considering the other independent variable;
can be seen by examining the difference between the marginal means
interaction effect
the effects of an independent variable at the different levels of the other independent variables;
the effect of one IV depends on which level of the other IVs you are at
cell mean
means inside the boxes in statistical analyses
complex comparisons
comparisons involving combined means
factorial ANOVA for repeated measures
all levels of all independent variables are applied to a single group of subjects
mixed ANOVA (split-plot ANOVA)
has at least one between-subjects independent variable and at least one repeated measures variable (or within-subjects) variable
chi-square test
used when frequencies, or the number of subjects within each category (as opposed to the mean scores on a measure), are given;
assess whether these observed frequencies differ from the expected frequencies;
compares observed frequencies of observations within nominal categories to frequencies that would be expected under the null hypothesis
chi-square (X2)
statistic that indicates whether the obtained frequencies in a set of categories differ significantly from what is expected under the null hypothesis
single-sample chi-square test
collecting categorical data from only one sample of individuals;
df equals C-1, where C represents the number of categories
multiple-sample chi-square test
adding another variable in addition to the one that gives rise to the classification categories;
df equals (C-1)(R-1), where C represents the number of categories; R = the number of rows (# of levels of the second variable)
Cautions in Using Chi-square
1) All observations must be independent of each other: No “before and after” studies
2) Each observation can be classifiable into only one category or cell: Mutually exclusive
3) Percentages of observations within categories cannot be compared
Calculating Expected Frequencies in single-sample Chi-Square
dividing the total number of subjects by the number of cells
Calculating Expected Frequencies in multiple-sample Chi-Square
fe = (column total)X(row total) / total N
where fe = the expected frequency for any cell;
column total = the sum of observations within a column containing that cell;
row total = the sum of observations within a row containing that cell;
total N = total number of subjects
Mann-Whitney U Test
compare two independent groups on a dependent variable measured with rank-ordered data
Wilcoxon Matched-Pairs Test
compare two correlated groups on a dependent variable measured with rank-ordered data
Kruskal-Wallis Test
compare two or more independent groups on a dependent variable with rank-ordered data
nonparametric alternative to Wilcoxon Matched-Pairs Test
t-test for correlated samples
nonparametric alternative to Kruskal-Wallis Test
one-way ANOVA
nonparametric alternative to Mann-Whitney U Test
t-test for independent samples
correlation
a relationship between two or more variables;
measure their “co-relation”, or the degree to which they co-vary
correlation coefficient
measures the correlation between two variables;
tells its magnitude and its direction;
ranges between -1.00 and +1.00
positive coefficient
indicates that the two variables move in the same direction
negative correlation
indicates that as one variable goes up, the other goes down
scattergram
relationship between two variables can be depicted on a graph;
an individual point represents the scores obtained by one individual on two measures
Pearson r
calculating the relationship between two variables that are measured on an interval or ratio scale
Factors Affecting the Pearson r
- Linearity: not the appropriate correlation coefficient to assess nonlinear relationships;
- Homoscedasticity: heteroscedasticity will lower the Pearson r correlation coefficient;
- Range of Scores: the wider the range of sampled behavior, the more accurate the estimation of correlation
scedasticity
refers to the way points are dispersed in a scattergram
homoscedasticity
dispersion of scores is equal throughout the scattergram
heteroscedasticity
more dispersion at some parts of the scattergram than at others;
he magnitude of the relationship between two variables depends on what level of the “X” or “Y” variable you are considering
Interpretation of the Pearson r
square it - the square of the correlation coefficient indicates the percentage of variability in one measure that is accounted for by variability in the other measure
point-biserial correlation
relates one continuous variable (interval or ratio scaled variable) and one dichotomous variable (one that can take only two values - gender)
biserial coefficient
two continuous variables are correlated, with one artificially being made dichotomous (income - high vs low)
phi coefficient
correlates both variables are dichotomous
tetrachoric coefficient
correlates both variables are artificially dichotomized
contingency correlation coefficient
correlation between two nominally scaled variables (two unordered variables, with each having more than two categories)
Spearman’s Rho
used to correlate two variables that have been ordinally ranked
eta
measures a nonlinear relationship
regression equation
when two variables are correlated, it is possible to construct an equation that could be used to estimate the value of a “criterion” (outcome) variable based on scores on a “predictor” (input) variable
continuous variable
variable that can assume an infinite number of real values within a given interval
regression line
a straight line that describes how a response variable Y changes as an explanatory variable X changes
error score
the difference between the predicted and the actual criterion scores;
assumed to be normally distributed with a mean of 0
least squares criterion
constructing the regression line involves identifying the line that results in the least amount of error in predicting Y scores from X scores;
regression line is drawn at the location where the sum of squared distances of dots from the line is the lowest
multiple regression
when two or more predictor variables are used to predict scores on one criterion
multiple correlation coefficient, or multiple R
statistic to measure multiple regressions;
the predictive power of the multiple regression equation;
the higher the value of multiple R, the stronger the relationship between the combination of predictor variables and the criterion variable
understand the following points about multiple correlation and multiple regression
1) multiple R is highest when predictor variables each have high correlations with the criterion but low correlations with each other;
2) multiple R is never lower than the highest simple correlation between an individual predictor and the criterion;
3) multiple R can never be negative;
4) multiple R can be squared to facilitate its interpretation
multicollinearity
significant predictor overlap
coefficient of multiple determination
multiple R squared;
indicates the proportion of variance in the criterion variable accounted for by the combination of predictor variables
Stepwise Multiple Regression
come up with the smallest set of predictors that maximizes predictive power;
useful technique if you have a relatively large number of potential predictors, but you want to use a smaller subset of these predictors in the final
reasons why you might want to cut back on the number of predictors used
1) the fewer the predictors, the less costly it is (time, money) to collect the data
2) due to multicollinearity, at some point, adding predictors results in little or no increase in predictive power
forward stepwise regression
start out with one predictor, and add predictors to the equation one at a time;
with each addition, conduct an analysis to determine if the predictive power of the multiple regression equation is substantially increased;
the more commonly used type
backward stepwise regression
start out with all of the potential predictors, and remove predictors one at a time
canonical correlation coefficient
used to calculate the relationship between two or more predictors and two or more criterion variables
discriminant function analysis
used when the goal is to classify individuals into groups on the basis of their scores on multiple predictors;
scores on two or more variables are combined to determine whether they can be used to predict which criterion group a person will belong to
difference between discriminant function analysis and multiple regression
1) discriminant: the DV is discrete (i.e., finite), predict criterion group membership;
2) multiple: the DV is continuous (i.e., infinite), multiple predictors are used to estimate a person’s criterion score
differential validity
each predictor has a different correlation with each criterion variable;
the computed validity coefficients are significantly different for different groups of examinees
logistic regression
process of modeling the probability of a discrete outcome given an input variable;
no assumptions need to be met for this;
nominal (categorical) or continuous
multiple cutoff
procedure involving setting a minimum cutoff score on a series of predictors;
if the cutoff score is not achieved on even one of the predictors, the person is not selected (college, job)
partial correlation
used to assess the relationship between two variables with the effects of another variable “partialled out” (statistically removed)
zero-order correlation
correlation between two variables is determined without regard to any other variables
suppressor variable
suppresses the relationship between a predictor and a criterion.
structural equation modeling
general term for a set of techniques that involve calculating the pairwise correlations between multiple variables;
used for the purpose of causal modeling - testing a hypothesis that posits a causal relationship among multiple (3 or more) variables;
includes path analysis and LISREL
path analysis
used to verify simpler causal models that propose only one-way causal flows between variables
LISREL
used when a model includes one-way and/or two-way causal relationships
latent variable
one that you infer is being measured, on the basis of statistical analysis
trend analysis
way of measuring the trend of change (linear, quadratic, cubic, quartic) in a DV in a repeated measures design;
indicate which (if any) trends tested for are significant;
both variables are quantitative (interval or ratio)
break point
a point where scores for all subjects change direction in a predictable way (stop increasing and start decreasing, or stop decreasing and start increasing)
population
the whole set of cases the researcher is interested in
population distribution
one that includes every single score in the population
sample distribution
set of scores obtained from a sample;
less score variability than the population distribution
sampling distribution
probability distribution of a statistic (mean, median, mode) that is obtained through repeated sampling of a specific population;
less variability than the population distribution
sampling with replacement
selected subjects are put back into the population before another subject are sampled
central limit theorem
under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution
robust
rate of false rejections of the null hypothesis (Type I) is not substantially increased by violations of these assumptions
autocorrelation
the degree of correlation of a variable’s values over time
Bayes’ Theorem
the probability of an event, based on prior knowledge of conditions that might be related to the event
meta-analysis
method of analyzing a group of independent studies with a common conceptual basis (integrating studies of the effectiveness of psychotherapy)
pros and cons of meta-analysis
Pro: allows for the consideration of the size of effects;
Cons: subject to the biases of the person doing the analysis, concentrating only on main effects and ignoring interactions results in a loss of information
moderator
a qualitative (e.g., race, sex, class) or quantitative (e.g., level of reward) variable that affects the direction and/or strength of the relation between an independent or predictor variable and a dependent or criterion variable
probands
individuals who are first brought to the attention of the researcher - i.e., individuals manifesting the characteristic of interest or disease
eigenvalue
a statistic that indicates the degree to which a particular factor is accounting for variability in the variables studied;
indicates its strength or explanatory power
resampling procedures
creation of new samples based on one observed sample;
compute a test statistic for each sample or rearrangement with the resulting set constituting the sampling distribution (often called a reference distribution) of that statistic
permutation test
begin with the original data then systematically or randomly reorder (shuffle) the data, and then calculating the appropriate test statistic on each reordering
cross-validation
uses a part of the available observation to fit the model, and another part to test in the computation of predication error
pooled variance
weighted average variance for each group;
“weighted” based on the # of subjects in each group;
assumes that the population variances are approximately the same, even though the sample variances differ
Solomon four-group design
a true experimental design used to evaluate the effects of pretesting, since some groups are pretested and others are not