Chapter 6 Flashcards
Bootstrap
a technique from which the sampling distribution of a statistic is estimated by taking repeated samples (with replacement) from the data set (in effect, treating the data as a population from which smaller samples are taken). The statistic of interest (e.g., the mean or b coefficient) is calculated for each sample, from which the sampling distribution of the statistic is estimated. The standard error of the statistic is estimated as the standard deviation of the sampling distribution created from the bootstrap samples. From this, confidence intervals and significance tests can be computed.
Contaminated normal distribution
see: mixed normal distribution
Mixed normal distribution: a normal-looking distribution that is contaminated by a small proportion of scores from a different distribution. These distributions are not normal and have too many scores in the tails (i.e., at the extremes). The effect of these heavy tails is to inflate the estimate of the population variance. This, in turn, makes significance tests lack power.
Hartley’s F max
also known as the variance ratio , is the ratio of the variances between the group with the biggest variance and the group with the smallest variance. This ratio is compared to critical values in a table published by Hartley as a test of homogeneity of variance . Some general rules are that with sample sizes ( n ) of 10 per group, an F max less than 10 is more or less always going to be non-significant, with 15–20 per group the ratio needs to be less than about 5, and with samples of 30–60 the ratio should be below about 2 or 3.
Heterogeneity of variance
This term means that the variance of one variable varies (i.e., is different) across levels of another variable.
Heteroscedasticity
This occurs when the residuals at each level of the predictor variables(s) have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different.
Homogeneity of variance
the assumption that the variance of one variable is stable (i.e., relatively similar) at all levels of another variable.
Homoscedasticity
an assumption in regression analysis that the residuals at each level of the predictor variable(s) have similar variances. Put another way, at each point along any predictor variable, the spread of residuals should be fairly constant.
Independence
the assumption that one data point does not influence another. When data come from people, it basically means that the behavior of one person does not influence the behavior of another.
Kolmogorov–Smirnov test
a test of whether a distribution of scores is significantly different from a normal distribution . A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.
Levene’s test
tests the hypothesis that the variances in different groups are equal (i.e., the difference between the variances is zero). It basically does a one-way ANOVA on the deviations (i.e., the absolute value of the difference between each score and the mean of its group). A significant result indicates that the variances are significantly different – therefore, the assumption of homogeneity of variances has been violated. When sample sizes are large, small differences in group variances can produce a significant Levene’s test. I do not recommend using this test – instead interpret statistics that have been adjusted for the degree of heterogeneity in variances.
M-estimator
a robust measure of location. One example is the median. In some cases it is a measure of location computed after outliers have been removed; unlike a trimmed mean , the amount of trimming used to remove outliers is determined empirically.
Mixed normal distribution
a normal-looking distribution that is contaminated by a small proportion of scores from a different distribution. These distributions are not normal and have too many scores in the tails (i.e., at the extremes). The effect of these heavy tails is to inflate the estimate of the population variance. This, in turn, makes significance tests lack power.
Outlier
a person or thing situated away or detached from the main body or system.
P-P plot
short for ‘probability–probability plot’. A graph plotting the cumulative probability of a variable against the cumulative probability of a particular distribution (often a normal distribution). Like a Q-Q plot , if values fall on the diagonal of the plot then the variable shares the same distribution as the one specified. Deviations from the diagonal show deviations from the distribution of interest.
Parametric test
a test that requires data from one of the large catalog of distributions that statisticians have described. Normally this term is used for parametric tests based on the normal distribution, which require four basic assumptions that must be met for the test to be accurate: a normally distributed sampling distribution (see normal distribution ), homogeneity of variance, interval or ratio data, and independence