Statistics Flashcards
nominal data
involves tallying people to see which non-ordered category each person falls into
e.g. sex, voting preference, ethnicity
ordinal data
involves tallying people to see which ordered category each person falls into
group means cannot be calculated from ordinal data
interval data
involves obtaining numerical scores for each person, where score values have equal intervals
either no zero score (e.g. IQ scores, t-scores) or zero is not absolute (e.g. temperature)
group mean can be calculated from interval data
ratio data
involves obtaining numerical scores for each person, where scores have equal intervals and an absolute zero
e.g. savings in bank, scores on EPPP, number of children, weight
comparisons can be made across score values (e.g. $10 is twice as much as $5)
measures of central tendency
mean, median, mode
best measure of central tendency typically the mean
when data skewed or there are some very extreme scores present, median preferable
standard deviation
measure of average deviation (or spread) from the mean in a given set of scores
square root of the variance
variance
standard deviation squared
range
crudest measure of variability
difference between highest and lowest value obtained
positive skew
higher proportion of scores in the lower range of values
mode has lowest value, mean has highest value
(bump on left)
negative skew
higher proportion of scores in the higher range of values
mean has lowest value, mode has highest value
(bump on right)
kurtosis
how peaked a distribution is
leptokurtotic distribution - very sharp peak
platykurtotic - flattened
norm-referenced score
provides information on how the person scored relative to the group
e.g. percentile rank
criterion-reference or domain-referenced score
e.g. percentage correct
standard scores
based on the standard deviation of the sample
e.g z-scores, t-scores, IQ scores, SAT scores, EPPP scores
z-scores
mean of zero, SD of one
shape of z-score distribution always identical to shape of the raw score distribution
useful because correspond directly to percentile ranks (ONLY IF distribution is normal) and easy to calculate from raw score data
transforming raw scores into z-scores does not normalize distribution
z-score formula
z=(score-mean)/(SD)
standard error of the mean
if researcher were to tape many, many samples of equal size and plot the mean IQ scores of these samples, researcher would get normal distribution of means
any spread or deviation in these means is error
average amount of deviation = standard error of the mean
standard error of the mean formula
SD(population) / SQRT (N)
central limit theorem
assuming an infinite number of equal sized samples are drawn from the population, and the means of these samples are plotted, a normally distributed of the means will result
tells researcher how likely it is that particular mean will be obtained just y chance - can calculate whether the obtained mean is most likely due to treatment or experimental effects or to chance (sampling error, random error)
rejection region
aka rejection of unlikely values
size of rejection region corresponds to alpha level e.g. when alpha is .05, rejection region is 5% of curve
when obtained values fall in rejection region, null hypothesis rejected, researcher concludes treatment did have an effect
Type I error
mistakenly rejecting null (differences found when they don’t exist)
corresponds to alpha
Type II error
mistakenly accepting null (differences not found, but they do exist)
corresponds to beta
power
defined as ability to correctly reject the null
increased when sample size is large, magnitude of intervention is large, random error is small, statistical test is parametric, test is one-tailed
power = 1-beta
as alpha increases, so does power
non-parametric tests
e.g. Chi-square, Mann-Whitney, Wilcoxin
if DV is nominal or ordinal
parametric tests
e.g. t-test, ANOVA
if DV is interval or ratio
assumptions of parametric tests
homoscedasticity - there should be similar variability or SD in the different groups
data are normally distributed