Week 3 Flashcards
What is variability in statistics?
A measure of the differences between scores in a distribution, it identifies how clustered or spread out scores are.
Scores/distributions with little variability are good for inferential statistics.
What is the range?
The range is the differences between the minimum score and the maximum score.
For discrete variables, range = Xmax - Xmin.
For continuous variables, range = URL for Xmax - LRL for Xmin.
The problem with the range is it’s susceptibility to extreme scores.
What are the three measures of variability?
- Range
- Standard deviation
- Variance
What is the deviation?
The distance from the mean for each individual score.
What is the variance?
The mean of the squared deviations.
What is the standard deviation?
The square root of the variance, it gives you the average distance from the mean.
What does SS represent?
Sum of squared scores.
What is the definitional formula for SS? what are it’s weaknesses?
SS = ∑(X - μ)².
Can become difficult to use or inaccurate with decimals and fractions.
What is the computational formula for SS?
SS = ∑X² - ((∑X)²/N).
This is better for dealing with decimals and fractions.
What does σ represent?
Population standard deviation.
What does σ² represent?
Population variance.
When is a sample statistic considered biased?
When it consistently overestimates or underestimates a corresponding population parameter.
What adjustments need to be made to the definitional and computation formulas for SS when used for samples?
N changes to n.
μ changes to M.
What does s represent?
Sample standard deviation (sometimes called the estimated population standard deviation).
What does s² represent?
Sample variance (sometimes called the estimated population variance). *VERY IMPORTANT* The equation for s² is: s²=SS/n-1 NOT SS/n (this accounts for the sampling error, specifically the tendency to underestimate the population variability)
What does n - 1 represent?
The degree of freedom; the amount of the scores that are allowed to vary. The remaining score is determined by the mean (M).
What does using n -1 in calculating s² produce unbiased statistics whereas n produces biased statistics?
Because it takes into account the tendency for samples to have less variance than populations.
What is the empirical rule in statistics?
A rule that states: (sometimes called the 68-95-99.7 rule)
Roughly 68% of the distribution scores with fall in the first standard deviation (34% each side of the mean).
Roughly 95% of the distribution scores will fall in the second standard deviation (13.5% each side of SD1, 47.5% each side of the mean).
Roughly 99.7% of the distribution scores will fall in the third standard deviation (2% each side of SD2, a little less than 50% each side of the mean).
Why is variance important for inferential statistics?
When comparing two sets of scores, high variance in a set of scores makes it harder to identify meaningful differences between a set of scores.
What steps should be taken in choosing the correct analysis for a single variable?
- Define the level of measurement
2. In SPSS use the frequencies procedure for categorical variables and the explore procedure for metric variables.
Are grouped frequency tables metric or categorical?
Categorical, they become ordinal.
What do outliers in boxplots represent?
It shows you the location for a particular score in the data sheet.
When is it best to use the definitional formula for SS?
The definitional formula is best used when the mean is a whole number and there are relatively few scores. range is irrelevant
What is it best to use the computational formula for SS?
The computational formula is best used when the mean is not a whole number or when there are many scores. range is irrelevant