Exam 3 Flashcards
when to use an F distribution
when working with more than 2 samples
when is ANOVA used?
with 2+ nominal independent variables, and an interval dependent variable. Analyzes if 2+ groups differ from each other in one or more characteristics
why not use multiple t-tests instead of an ANOVA?
p increases with each test (weak evidence against the null. Higher chance of making a type 1 error - rejecting null when null is true)
F statistic
a value you get when you run an ANOVA test or a regression analysis to find out if the means between two populations are significantly different
F distribution
distribution of all the possible F stastics
F =
variance between-groups / variance within-groups
s^2between / s^2within
variance between-groups
estimate of the population variance based on differences among group means
variance within-groups
estimate of the population variance based on differences within sample distributions
another way to think about F ratios
each score in a sample is a combination of treatment effects and individual variability or error
if between-groups variance is 8, and within-groups variance is 2, what would F be?
4
between-groups variance equation
s^2 = SSbetween / dfbetween
SSbetween / dfbetween
between-groups variance
dfbetween=
2(groups) - 1
within-groups variance equation
SSwithin / dfwithin
SSwithin / dfwithin
within-groups variance
dfwithin=
dfgroup1 + dfgroup2
one-way ANOVA
1 nominal variable with 2+ levels and a scale DV
within-groups ANOVA
more than 2 samples with same participants. Also called repeated-measures
between-groups ANOVA
more than 2 samples with different participants in each sample
homoscedasticity
assumption of ANOVA. Samples come from populations with the same variance
effect size for ANOVA
r^2
formula for calculating effect size for ANOVA
r^2 = SSbetween / SStotal
small effect size for ANOVA
r^2 = .01
medium effect size for ANOVA
r^2 = .09
large effect size for ANOVA
r^2 = .25
post-hoc tests determine…
which groups are different
when you have three groups, and F is significant, how do you now where the difference(s) are?
post-hoc tests
type of post-hoc tests
Tukey HSD, Bonferonni
Tukey HSD test
widely used post hoc test that uses means and standard error
bonferroni test
post-hoc test that provides a more strict critical value for every comparison of means. We use a smaller critical region to make it more difficult to reject the null. Determine the number of comparisons we plan to make, divide the p level by the number of comparisons
one-way within-groups ANOVA
same participants do something multiple times. Used when we have one IV with at least 3 levels, a scale DV, and the same participants in each group
benefits of within-groups ANOVA
we reduce error due to differences between the groups. We know that the groups are identical for all of the same participants. We are able to reduce within-groups variability due to differences for the people in our study across groups
matched groups
use different people who are similar on all of the variables that we want to control. We can analyze our data as if the same people are in each group, giving us additional
two-way ANOVAs
used to evaluate effects of more than one IV on a DV. Used to determine individual and combined effects of the IVs
interaction
occurs when 2 IVs have an effect in combination that we do not see when looking at each IV individually
when to use Two-Way ANOVAs
to evaluate effects of 2 IVs, it is more efficient to do a single study than two studies with 1 IV each. Can explore interactions between variables
cell
box depicting a unique combination of levels of IVs in a factorial design
main effect
when one IV influences the DV
interaction effect
when the effect of one IV on the DV changes as a result of the level of a second IV
two types of interactions in ANOVA
quantitative, qualitative
correlation
co-variation or co-relation between two variables. These variables change together. usually scale (interval or ratio) variables
correlation coefficient
a statistic that quantifies a relation between two variables. Can be either + or -. Falls between -1.00 and 1.00. The value of the number (not the sign) indicates the strength of the relation
positive correlation
association between variables such that high scores on one variable tend to have high scores on the other variable. A direct relation between the variables
negative correlation
association between variables such that high scores on one variable tend to have high scores on the other variable. An inverse relation between the variables
Pearson Correlation Coefficient
a statistic that quantifies a linear relation between two scale variables. Symbolized by the italic r when based on sample data, italic p (“rho”) when it is a population parameter
psychometrics
used in the development of tests and measures
psychometricians
use correlation to examine reliability and validity
reliability
consistent measure. Particular type: test-retest reliability. Ex: how fast a pitcher can throw a baseball
validity
measures what it was designed or intended to measure. Correlation is used to calculate validity, and can be used to establish validity (much more difficult than establishing reliability)
partial correlation
a technique that quantifies the degree of association between two variables after statistically removing the association of a third variable. Allows us to quantify the relation between two variables, controlling for the correlation of each of these variables with a third related variable
regression _____, correlation _____
predicts, describes
simple linear regression
statistical tool that lets us predict an individual’s score on the DV based on the score on one IV
linear regression
intercept: predicted value of Y when X = 0
slope: the amount that Y is predicted to increase for an increase of 1 in X
regression with z scores
calculate the z score, multiply he z score by the correlation coefficient. Convert the z score to a raw score
determining the regression equation
1) find the z score for X
2) use the z score to calculate the predicted Y value
3) convert the z score to its raw score
determining the regression equation: calculating the slope
1) find the z score of X of 1
2) use the z score to calculate the predicted score on Y
3) convert the z score to its raw score
4) find a predicted score
standard error of the estimate
indicates the typical distance between the regression line and the actual data points
multiple regression
statistical technique that includes 2+ predictor variables in a prediction equation
stepwise regression
a type of multiple regression in which computer software determines the order in which IVs are included in the equation. The default in many computer software programs
strength of using stepwise regression
relies on data, rather than theory. Especially good when a researcher is not certain of what to expect in a study
structural equation modeling (SEM)
a statistical technique that quantifies how well sample data “fit” a theoretical model that hypothesizes a set of relations among multiple variables. Encourages researchers to think of variables as a series of connections
chi square test is a ____ test
nonparametric
when to use nonparametric tests
- DV is nominal
- either the DV or IV is ordinal
- when sample size is small
- when underlying pop isn’t normal
limitations of nonparametric tests
- can’t easily use confident intervals or effect sizes
- have less statistical power than parametric tests
- nominal and ordinal data provide less info
- more likely to commit Type II error
chi-square test for goodness-of-fit
nonparametric test when we have 1 nominal variable. Determines whether or not the observed categories are similar or different from the hypothesized relative frequencies within those same categories
chi-square test for independence
nonparametric test when we have 2 nominal variables. Determines whether the first variable is related to the second variable or not
Cramer’s V (phi)
the effect size for chi-square test for independence
when looking at an ANOVA source table, what value is of interest to researchers?
between-groups F
if lines are separate but parallel when you draw lines to connect bars in a bar graph of a factorial ANOVA
there is a main effect
if lines intersect when you draw lines to connect bars in a bar graph of a factorial ANOVA
there is an interaction
MANOVA (multivariate analysis of variance)
ANOVA in which there is more than one dependent variable
ANCOVA (analysis of covariance)
ANOVA that statistically subtracts the effect of a possible confounding variable
MANCOVA (multivariate analysis of covariance)
an ANOVA with multiple dependent variables and the inclusion of a covariate
quantitative interaction
treatment effect varies in magnitude, but is always the same direction
qualitative interaction
treatment effect changes direction
marginal mean
the mean of a row or a column in a table that shows the cells of a study with a two-way ANOVA design
standardized regression coefficient
a standardized version of the slope in a regression equation. The predicted change in the dependent variable in terms of standard deviations for an increase of 1 standard deviation in the independent variable
orthogonal variable
is an independent variable that makes a separate and distinct contribution in the prediction of a dependent variable, as compared with the contributions of another variable
multiple regression
statistical technique that includes 2+ predictor variables in a prediction equation
hierarchical multiple regression
a type of multiple regression in which the researcher adds independent variables into the equation in an order determined by theory
factorial ANOVA
catch-all phrase for two-way, three-way, and higher-order ANOVAs
factor
a term used to describe an independent variable in a study with more than one independent variable
when the p value is small (< .05)
strong evidence against the null
when the p value is large (> .05)
weak evidence against the null
z score
measure of how many standard deviations below or above the population mean a raw score is. Used in calculating regression lines
standardized regression equation
zY hat = (rxy)(zx)
regression to the mean
the tendency of scores that are particularly high or low to drift toward the mean over time
The predicted z-score on the dependent variable will always be ____ to its mean than the z-score for the independent variable
closer
why?: regression to the mean)
proportionate reduction in error
also called the coefficient of determination. quantifies how much more accurate our predictors are when we use a regression line vs. mean
adjusted standardized residuals
the difference between the observed frequency and he expected frequency for a cell in a chi-square research design, divided by the standard error, also called adjusted residual