Chapter 2 Flashcards
frequency distribution
records number of times each possible thing occurs during an experiment
histogram
groups adjacent values together to give a visual picture, obscuring noise while preserving important data trends, looks like bar graph
real lower limit
smallest value that would be classed as falling into the interval, like rounding
real upper limit
largest value that would be classed as being in the interval, like rounding
midpoint
average of the upper and lower limit presented for convenience
outlier
extreme value that is widely separated from the rest of the data, frequently representing errors in recording data (but not always)
normal curve
bell-shaped curve that is symmetrical around the center of the distribution
kernel density plot
pays no attention to mean and standard deviation, instead holds to the idea that each observation might have been slightly different
stem-and-leaf display
Tukey, exploratory data analysis, helpful for comparing 2 different distributions
leading digits
most significant digits, form the stem (vertical axis) of the stem-and-leaf display
stem
vertical axis of the stem-and-leaf display, formed by the leading digits/most significant digits
trailing digits
less significant digits, form the leaves (horizontal elements) of the stem-and-leaf display
leaves
horizontal elements of the stem-and-leaf display, formed by the trailing digits/less significant digits
bimodal
graph having two predominant peaks instead of one (even when these peaks are not exactly the same height)
unimodal
distribution having only one major peak
modality
refers to the number of major peaks in a distribution
negatively skewed
distribution with tail going out to the right (they point to the negative)
positively skewed
distribution with tail going out to the left (they point toward the positive)
skewness
statistical measures of the degree of asymmetry
kurtosis
the relative concentration of scores in the center, the upper and lower ends (tails) and the shoulders (between the center of the tails) of a distribution
mesokurtic
a normal distribution, with tails normally proportioned (neither too thick nor too thin) and with center normally shaped (neither too many nor too few scores concentrated there)
platykurtic
flatter-shaped distribution where scores are concentrated in the shoulders (pulled in from the tails and down from the center)
leptokurtic
distribution with higher-than-normal center peak and thicker-than-normal tails
sigma
standard notation for sum (adds up to)
measures of central tendency
different statistics that measure the “center” of the distribution
measures of location
reflect where on the scale the distribution is centered
mode
Mo-most common score, advantage: represents the largest number of people & unaffected by extreme scores
median
Mdn-corresponds to the point at or below which 50% of the scores fall when the data are arranged in numerical order, advantage: unaffected by extreme scores
median location
(N+1)/2
mean
X bar, sum of scores divided by # of scores, disadvantage: influenced by extreme scores, value may not actually exist in the data; advantage: can be manipulated algebraically, estimates the population well
relation of measures of central tendency to one another
whenever the distribution is normal (unimodal and symmetric), the mean, median, and mode will all be close to one another
trimmed means
means calculated on data for which we have discarded a certain percentage of the data at each end of the distribution, to weaken the effects extreme scores have on the mean and to use a population estimate with a small standard of error
dispersion
variability around the central measure of tendency (usually around the mean)
range
measure of distance from lowest to highest score, relies on the extremes and so may be a distorted picture of the variability
interquartile range
discards upper and lower 25%ages of scores, leaving middle half to make up the range (Q3-Q1), can discard too much of the data to be good representative of a sample
first quartile
point that cuts off the lowest 25% of a distribution, Q1
third quartile
point that cuts off the upper 25% of a distribution, Q3
second quartile
median of a distribution, Q2
Winsorized sample
using trimmed samples to estimate variability, dropping a %age of the highest and lowest scores and replacing them with copies of the highest and lowest remaining scores
absolute value
positive expression of an integer (for example, the absolute value of -3 is 3.)
mean absolute deviation
turning all numbers into their absolute values (eliminating negative numbers) prior to finding the mean to determine deviation from the mean
standard deviation
sum of all (X-Xbar), squared
divided by
N-1
sample variance
(s squared), part of the whole population
population variance
(sigma squared), whole population
coefficient of variation
CV=(standard deviation / mean) X 100, to express the answer as a percentage; to determine which of two groups/tests is better
statistics
characteristics of samples, designated by Roman letters
parameters
characteristics of populations, designated by Greek lettes
population mean
symbolized by the Greek mu
expected value
long-range average of many samples
unbiased estimate
estimator whose expected value equals the parameter to be estimated
degrees of freedom
df: losing one degree of freedom (dividing by N-1 instead of just by N) because mu is not known and must be estimated from the sample mean
boxplot
aka box-and-whisker plot-method of looking at data, designed by Tukey, includes a scale that covers the whole range of obtained values, a rectangular box drawn from Q1 to Q3 with a vertical line representing the median, and lines called whiskers from the quartiles out to the adjacent values
quartile location
taking the 1st and 3rd quartiles, (median location +1)/2
inner fences
point that falls 1.5 times the interquartile range below or above the appropriate quartile
adjacent values
those actual values in the data that are no more extreme (no farther from the median) than the inner fences
deciles
like quartiles, but divide the distribution into 10ths rather than quarters
percentiles
divide the distribution into hundredths
quantiles/fractiles
dividing data into chunks for statistical purposes, like percentiles
linear transformations
multiplying a value by a constant and adding a constant to express the same value in a new way (like converting Celsius degrees into Fahrenheit degrees)
nonlinear transformations
using exponents, logarithms, trigonometric functions, etc. to transform values into another expression, usually involve a change in shape of a distribution
centering
subtracting sample mean from all of the observations, rendering the new mean 0.00 but not affecting the standard deviation or the variance
reflection
preventing subjects from simply checking the same point on the scale all the way down without thinking by reversing the phrasing of the questions (half could be positive, like “strongly agree”, and half could be negative, like “strongly disagree”), accomplished by a linear transformation
deviation scores
employed to rescale data, subtracting mean from each observation
standard scores
creating deviation scores and then dividing them by the standard deviation
standardization
creating standard scores from raw scores