Ch2 - Stats for Testing Flashcards
Measurement
the use of certain devices/rules for assigning numbers to objects/events
Variables + types
anything that varies
• Visible/invisible
• Discrete (errors in counting)/continuous (measurement errors)
• Dichotomous / Polytomous (discrete variables assuming + than 2 values)
Nominal scales type of data
Categorical data: data related to variables such as gender, color, that derive from assigning people, objects or events in categories/classes
• The only property of the numbers given to define the categories is identity
• We can only count the frequencies within each category
Ordinal scales properties
Added property of rank order: the elements in a set can be lined up in a series arranged on the basis of a single variable (ex: birth order)
• Rank orders carry no information regarding the distance between positions
In psych, rank-ordered tests are reported as percentile rank scores (PR)
• Ordinal numbers from 1 to 100, rank indicates the % of individuals in a group who fall at or below a given level of performance
○ Ex: 70 - level of performance that equals or exceeds 70% of the group
Ordinal data can be manipulated like nominal data, but also with Spearman’s rho correlation coefficient
Interval scales properties
Difference between any 2 consecutive numbers is the same that the numbers represent
• Ex: if 2 days are 12 days apart, they are exactly 3 times as far apart as 2 days that are 4 days apart
○ *some months are longer than others so it does not apply to months
• There is no agreed upon starting point for the calendar, so no absolute 0 - can’t be interpreted as ratios
Ratio scales properties
Numbers achieve additivity: they can be added, substracted, multiplied, divided and the result will be a meaningful ratio
• Have a true/absolute 0 - represents NONE of what is measured
• Ex: an object of 16pounds is 2x as heavy as an 8pound object, and 0pounds indicates weightlessness
Problem with ratio IQs
Ratio IQs were obtained by:
• (Mental Age (result on S-B test) / Child’s chronological age) x 100 = ratio IQ
• Idea: average children would have an IQ of 100 (since their mental age would equal their actual age)
• BUT: did not work with adolescents/adults bc their development is less uniform / less intense
○ Mental age = ordinal-level measurement
○ Chronological age = ratio scale
○ Dividing the 2 cant lead to a meaningful number
2 types of stats
- Descriptive: maths dedicated to organize / summarize / etc data
- Inferential: used to estimate pop values based on sample values
Statistics def
relate to sample data
Parameter def
relate to population data
○ Mathematically exact numbers (or constants) that are not usually attainable unless a population is so fixed and circumscribed that all of its members can be accounted for
Frequency distributions
frequency with which each scores occurs in a distribution
• Can also include percentile rank scores (Cumulative Percent Column)
• Grouped frequency distributions: when the ranges are too large
Graphs (+ best types for discrete vs continuous data)
Frequency tables can be made into graphs for even easier reading
• Discrete/categorical data: pie charts or bar graphs
• Continuous/metric data: histograms/frequency polygons
• Measures of central tendency
○ Mode: + frequent value in a distribution (bimodal/multimodal = more than one variable with the same value)
○ Median: value that divides a distribution that has been arranged in order of magnitude into 2 halves
○ Mean: arithmetic average (u for pop and M for sample)
Range
distance between the 2 extreme points
Semi-interquartile range
1/2 of the interquartile range (IQR) - the middle 50% of a distribution
IQR
distance between the points that demarcate the tops of the first and third quarters of a distribution
Variance
sum of squared differences between each values and the mean of that distribution, divided by n (AKA Average sum of squares SS) - represents average variability in distribution
Sum of squares
represents total amount of variability in a score distribution
Standard deviation
square root of the variance - represents the average variability in a set of scores
Properties of the Normal Curve Model
- Its limits extend to +/- infinity
- Is unimodal
- Mean = median = mode = center
Standard Normal Distribution
Normal Curve with a mean of 0 and a standard deviation of 1
Uses of the Normal Curve
desctiptive/inferential
• Descriptive Uses ○ Normalizing scores: transforming them so that they have the same meaning, in terms of their position, as if they were coming from a normal distribution • Inferential Uses ○ Estimating population parameters ○ Testing hypotheses about differences
Sampling distribution
hypothetical distributions of values made on the assumption that an infinite number of samples of a given size could be drawn from a population - if this were done, the resulting distributions of statistics (sampling distributions) would be normal
Standard Error
the standard deviation of the sampling distribution
Z distribution
the standard normal distribution
○ Smaller samples: Student’s t-distribution
Standard error of the mean
(SEm) -> s / √n
• S = standard deviation of the sample
• N = number of cases in the sample
Applying this to the mean 1x gives us a 68% confidence interval around the mean
To have larger confidence we need to apply it 2x (95% CI)
We can establish ranges within which the pop parameters are likely to be found
Assuming that
• The sample mean is the best estimate of the pop mean
• The st error of the mean = the st dev of the hypothetical sampling distribution of means, assumed to be normal
Nonnormal distributions
Proportions under the curve no longer apply
• May all have the same frequencies
• May have +1 mode
Kurtosis
• Kurtosis: stems from Greek for “convexity” - flatness of peakedness of a distribution
○ Platykurtic: more dispersion, more extended tails
○ Leptokurtic: least dispersion, not much extended tails
○ Mesokurtic: normal distribution
Skewness
lack of symmetry
○ Normal distribution - Sk = 0 ○ Negatively skewed: Sk <0 ○ Positively skewed: Sk >0
Univariate vs bivariate stats
Univariate statistics: measures of a single variable
Bivariate or multivariate: at least 2 sets of measurements on the same groups of people or matched pairs for 2 sets of individuals
Coefficients of determination (in correlations)
• Coefficients of determination: tell us how much the variance in Y can be explained by the variance in X (obtained by squaring the correlation coefficient)
Regression towards the mean
extreme scores on one variable are associated with scores closer in the mean in another
Regression line
slope represents the strength/magnitude of the relationship between 2 variables
○ Greater slope = greater relationship between the variables
○ Allowed us to predict a variable of Y based on a variable of X known to be related
Conditions for using Pearson’s r
- The pairs of observations are independent of one another
- The variables to be correlated are continuous and measured on interval or ratio scales
- The relationship between the variables is linear, that is, it approximates a straight-line pattern
Heteroscedasticity
the dispersion in the scatterplot is not uniform throughout the range of values
Homoscedasticity
uniform dispersion though the range
Spearman’s rho (rs)
correlation for ordinal variables
Eta (n)
correlation for curvilinear relationships
Point biserial correlation (rpb)
when one variable is dichotomous
Dichotomous variables: often in true/false answer tests
Phi or fourfold coefficient
when both variables are dichotomous
Multiple correlation coefficient (R)
when a single dependent v is correlated with 2+ predictors