Ch. 3 - Statistics Flashcards
when two numbers are in the middle of a distribution… how do you find the median?
average the middle two
measurement
the act of assigning numbers or symbols to characteristics of things (people, events) according to rules
discrete scale
there are set categories (like Y/N)
continuous scale
categories that theoretically can be divided
measurement always involves ___
error
error
the collective influence of all of the factors relating to a measurement or test score beyond those the examiner meant to measure
examples of error
distractions (mood, hunger, environment), the selection of test items on that exam, inaccuracy of the measurement tool (crappy ruler)
in assessment, we measure characteristics in ___
quantifiable terms (though the definition of quantifiable is up for debate). there are 4 scales of measurement to help us define quantifiable
nominal scale
numbers are arbitrarily assigned to represent categories. can’t do stats on them. ex 1= yes 2=no
ordinal scale
magnitude or rank order is implied. but nothing is implied about how much greater one ranking is than another. has no absolute zero, limited stats. rank: chocolate, pizza, steak, onion rings
interval scale
establishes equal distances between measurements, but no absolute zero reference point. can average scores meaningfully (IQ scores - could be ordinal bc maybe not measuring actual intelligence with meaning)
ratio scale
has equal intervals AND a meaningful zero point. all math can be performed (weight, hand strength, time to finish a task)
measures of central tendency
tell you something about the “center” of a series of scores. mean, median, and mode. give dif info based on skewed vs normal curves
which of the measures of central tendency are used for interval or ratio data that is believed to be normally distributed?
mean (no using stats for nominal or ordinal data)
mode is useful…
with qualitative data (which words used most often in interviews). is a nominal statistic (can’t be used in further calculations)
median is useful…
when there are few scores at the high and low end. can be used for ordinal, interval, and ratio data
normal distribution (AKA Gaussian)
bell-shaped, smooth, mathematically defined curve that is higest at center and tapers to approach the X-axis asymptomatically. perfectly symmetrical with no skewness. most traits thought to approximate the normal curve in a pop. mean, median, and mode are the same.
negative skew
tail is going negative area! few scores fall at low end (easy test)
positive skew
tail is going in the positive area. few scores fall at high end (difficult test)
a distribution with less variability has…
a steeper curve. with more variability, the scores are more spread out (flatter curve)
raw scores are often
meaningless. must take the raw scores and do something with them to make meaning
(simple) frequency distribution
orders a set of scores from high to low and lists the corresponding frequency
grouped frequency distribution
(AKA class intervals) - tells you how many people scored within a group of scores (class)
kinds of graphs used to illustrate frequency distributions
histogram (bars touch, continuous data), bar graph (bars do not touch, discrete data), frequency polygon (i.e. line graph)
the mean is a ___-level statistic
interval. most stable and useful measure of central tendency
variance
spread of data around the mean. a way to capture the scale or degree of being spread out.
measures of variability include:
range, interquartile range (Q3-Q1), semi-interquartile range (Q3-Q1/2), average deviation, standard deviation, variance
range is suceptible to
outliers
standard deviation
the average amount of deviation from the mean within a group of scores. AKA the square root of variance. tells you more about the range of scores, or how they differ from the mean. small s for sample SD and cursive o for population SD
standard deviation is equal to
the square root of the variance
a “normal” distribution has the greatest frequency of scores occuring
near the mean
the greather the standard deviation…
the greater the spread of scores
n-1 vs n
n-1 for sample, n for pop. we use n
normal curve - X scores fall between +- 1, 2, 3 SDs of mean
68% +-1
95% +-2
99% +-3
kurtosis
steepness of a distribution
platykurtic
relatively flat distribution
leptokurtic
relatively peaked distribution
mesokurtic
between leptokurtic and platykurtic
standard score
raw score that has been converted from one scale to another, with the latter scale having some set mean and SD. these have universal meaning
examples of standard scores
z-score, T-score, stanines
z-scores
mean of 0 and SD of 1; show how many standard deviation units the score is above or below the mean, z= x-Xbar / s
T-scores
a z-score transformation, mean = 50; SD= 10; ranges from 5 SD above and below mean (0 to 100) cannot be negative! makes more intuitive sense than z-scores. T= 50
stanines
“standard nine”, standard scores mean = 5, SD = ~2. Values from 1 to 9
when do standard scores retain a direct numerical relationship to the orginal raw score?
when the standard score is obtained by a linear (vs nonlinear) transformation.
when an original distribution goes through a nonlinear transformation, it is said to be
normalized
normalizing involves
“stretching” a skewed distribution to fit the normal curve. technical worries here. better to fine tune an assessment so that the scores are normally distributed
correlation
an expression of the degree (strength) and direction of correspondence between two things. little r (Pearson r). perfect positive and perfect negative. “high” values of both are impressive (-.9 and +.9)
correlation has a range of values from
-1 to 1
positive correlation means
both variables increase, or decrease, together
negative correlation means
as one variable increases the other decreases
correlation does not mean ___, but it does imply ___
causation, but it does mean prediction
Pearson R should be used
when the relationship between variables is linear and the variables/data are continuous
a scatterplot shows
the direction and magnitude of the relationship (if any) between two variables (AKA scatter diagram). helps you spot outliers and reveal presence of curvilinearity (can’t use r if curvilinearity)
use Spearman Rho
vs. Pearson r when you have a small sample size (<30) and especially when you have ordinal or rank order data
meta-analysis
family of techniques used to statistically combine information across studies to produce single estimates of the data under study. can give more weight to studies that have larger #s of subjects
0 correlation means
no relationship; not correlated
once you have an r, you need to…
figure out if the r is significant (depends on sample size, N). Significance at the .05 level is what you’ll usually see reported. only 5% or less chance that the correlation was due to chance.
regression
the analysis of relationships among variables for the purpose of understanding how one variable predicts another
simple regression
one independent variable (X) and one dependent variable (Y) - the outcome variable. results in a regression line, or line of best fit, that comes closest to the greatest # of points in a scatterplot
regression formula
y = a+bx (a = y-intercept; b = slope)
standard error of the estimate
the error when you predict y from x from a regression line of best fit
the higher the correlation between X & Y…
the greater the accuracy of the prediction of y from x, and the smaller the standard error of the estimate
we are more confident of predictions near the X of an interval
middle (mean) - more data points. we are less certain at the tails - fewer data points
multiple regression
when you have sevaral variables predicting each other, predictors will be weighted, more variables = better prediction, but some are better predictors than others