1.2 Flashcards
Population
The collection of units to which we want to generalize a set of findings or a statistical model.
(i.e. people, plankton, plants, cities, suicidal authors, etc.)
Sample
A smaller (but hopefully representative) collection of units from a population used to determine truths about that population.
The mean is a model of
what happens in the real world: the typical score.
It is not a perfect representation of the data.
A deviation is…
the difference between the mean and an actual data point.
Sum of Squared Errors
- We could add the deviations to find out the total error.
- Deviations cancel out because some are positive and others negative.
- Therefore, we square each deviation.
- If we add these squared deviations we get the sum of squared errors (SS).
Variance
- The sum of squares is a good measure of overall variability, but is dependent on the number of scores.
- We calculate the average variability by dividing by the number of scores minus 1 (which is called the degrees of freedom).
- This value is called the variance (s^2).
SS / (N-1)
The variance has one problem:
It is measured in units squared. So difficult to interpret.
The standard deviation
Since the variance is measured in units squared. We take the square root to make it a meaningful metric.
The sum of squares, variance, and standard deviation represent the same thing:
- The ‘fit’ of the mean to the data
- The variability in the data
- How well the mean represents the observed data
- Error
Central Limit Theorem
The distribution of the sample means will be approximately normally distributed
Central Limit Theorem
How can we measure the accuracy of this average?
- We can use the standard deviation of the sample means.
- In fact, we could collect a very large number of samples, and calculate the standard deviation of the sample means from the population mean.
- Because this is tedious and almost impossible, statisticians have found an approximation.
—> approximation = standard error
Test Statistics
- A statistic for which the frequency of particular values is known.
- Observed values can be used to test hypotheses.
Type I error
- occurs when we believe that there is a genuine effect in our population when, in fact, there isn’t
- The probability is the α-level (usually .05)
Type II error
- occurs when we believe that there is no effect in the population when, in reality, there is.
- Or, put differently: when we use tests, do not find an effect, but there really is one.
- The probability is the β-level (often .2)
Examples type 1/2 error
Type 1: we believe pregnancy is there, but it’s actually not there
Type 2: we believe pregnancy is not there, but there’s actually a pregnancy present
Type 1: covid test, you think you have it, but you don’t
Type 2: covid test, you think you don’t have it, but you do
What Does Statistical Significance Tell Us?
The importance of an effect?
No, significance depends on sample size. When doing tests, you should always aim at interpreting the effect size as well.
What Does Statistical Significance Tell Us?
That the null hypothesis is false?
No, it is always false.
What Does Statistical Significance Tell Us?
That the null hypothesis is true?
No, it is never true.
Assessing Normality
We don’t have access to the sampling distribution so we usually test the observed data
Shapiro-Wilk Test
- Tests if data differ from a normal distribution
- Significant = non-normal data
- Non-significant = normal data
Shapiro-Wilk test for exam and numeracy for whole sample