lecture 6 - statistical reasoning, normal distributions and z-scores Flashcards
basic premise of psychology
Human behaviour is variable but still predictable. Experimental methodology and statistics predict behaviour
examples of predictable behaviour
Individual people are less likely to help a stranger in need if there are other people around than if they’re the only other person around (the Bystander Effect, Darley & Latané, 1968).
Most Norwegian children (75%) have started walking by the time they’re 14 months old (Størvold, Aarethun, Bratberg, 2013)
Human behaviour including ours is a lot more predictable than most of us are comfortable admitting….
what does experimental method get you?
It allows you to figure out what causes what.
Understanding and explanation using causal mechanisms,
Helps to get things you want:
Facilitates prediction and ….
2nd basic premise of psychology
Human behaviour is variable but still partly controllable.
Experimental methodology and statistics control behaviour.
examples of controlling behaviour
perception - the ponzo illusion
social interaction
memory - primary and recency effects
reinforcement learning - reward and punishment
Psychology is little about knowing facts and a lot about learning to figure out!
- Observing/ measuring behaviour
- Evaluating behaviour
Predicting and controlling behaviour
- Evaluating behaviour
why do we need statistics?
we observe (measure) some kind of behaviour - its variable
is that variability predictable/controllable in some way that’s useful? or is it just “noise”?
the main questions in statistics
what’s the probability that this data sample came from that population?
the main question in psychology
I’ve got this data sample, now what do I “know” about that population?
The main statistics and the main psychology question are not the same question
However, understanding why they’re not the same but are nonetheless related is something you’ll gradually come to understand better….
The key thing for now is to know the goal: using the answers to statistics questions to answer psychological questions.
an example if the population is known - Howell
some psychological test for the healthy population might be normally distributed with a mean = 50 and SD = 10 and the sample is an individual with a score of 70, is 70 sufficiently extreme to reject the hypothesis it came from this population?
For example (Howell), there is a neurological finger tapping test where people who have had brain injuries can’t tap as fast a healthy normal individual. The individual typically doesn’t know their tapping is impaired and this can be used to help diagnose brain injury. So how does this relate to the example in the figure? The line of reasoning is that this individual’s score is sufficiently extreme to be unlikely for individuals in the healthy population, we might reject the hypothesis that they were from this population and use it to argue for example that they are likely have had a brain injury. Note. What the psychology question is here in terms of a sample and a population is quite subtle: I’ve rejected the null hypothesis that this individual is from healthy population and use that to draw the psychological conclusion that this individual has a brain injury. So the data sample is the collected measurements from the individual and am now using that saw to draw a conclusion about “the population” which is still the individual.
mostly the statistics question takes the form
What’s the probability of this data sample if there’s nothing other than just noisy (“unsystematic“) variability?
If that probability is small, traditionally p < 0.05, then we reject the null hypothesis.
The elaborated psychology question is (roughly) :
Given this (sample) data and statistical analysis of it (e.g. which rejected the null hypothesis), what can I reasonably conclude about the population I’m interested in?
The statistical reasoning process is mostly based on distributions
for populations, samples and sample statistics -
Population distribution
- Sometimes the entire population is known exactly, e.g. stats exam scores from this class for students in the second year.
- Sometimes the distribution of the entire population is known quite precisely even though the attributes of every single individual aren’t known, e.g. the distribution of heights for everyone in Wales even though I might not know your height.
But in research, we don’t commonly know the population distribution we’re interested in very precisely.
- Sometimes the distribution of the entire population is known quite precisely even though the attributes of every single individual aren’t known, e.g. the distribution of heights for everyone in Wales even though I might not know your height.
sample distribution
In contrast, we usually do know the sample distribution, e.g. I measure the heights of 20 people walking by and construct a histogram.
sample statistic distribution
Calculate some attribute of a sample, e.g. its mean. Do it for another sample, etc., and that attribute also will have a distribution. For example, measure 20 heights, calculate a mean. Repeat.
What is a distribution? It’s a summary of the prevalence of an attribution across some set, e.g. the distribution of IQ’s above 145 in the general population is…. ?
the central limit theorem
the distribution of the sample means will be normal if the sample size is large enough regardless of the shape of the population distribution! this is one reason normal distributions are so important for statistics but its also related to why so many things are normally distributed in reality.
even more generally in statistics we regularly know the shape of the distribution for some attribute of a sample and regularly use this to draw some conclusion about a particular sample.
normal distributions shape
normal distribution all have the same basic bell shape but can differ widely based on their central tendency - their mean and their variability - their standard deviation
areas and standard deviation
for all normal distributions
in 1st SD - approx 34% of data lives there
2nd SD - approx 14% of data lives there
3rd SD - approx 2% of data lives there
so all the different normal distributions can be reduced to one standardized normal distribution
z - scores and the standardised normal distribution
z-score - number of standard deviations (σ) a score (x) is from mean (μ)
eg for population of IQ scores with μ = 100 σ = 15
for a standard normal distribution (a z distribution)
mean = 0
SD = 1
convert score to z-score
eg what percentage of people have an IQ scores greater than 130?
convert 130 to a z score -
IQ scores have μ = 100 σ = 15
z = x - μ/ σ or 130-100/15 =2
so 2% of the population of IQ scores greater than 130
number of standard deviations (σ)
score (x)
mean (μ)
converting z-scores into a z score
convert your score into a z score
table gives % scores higher than the particular score
why does the z-distribution matter?
- What if you could assume the sampling distribution of the mean was normal for your null hypothesis? (Can you assume that….?)
- What if you could use the mean and standard deviation of your sample to produce a z-score? (Could you….?)
- What could you do with that score? (Could you reject the null hypothesis….?)
Is rejecting the null hypothesis useful?
probability distribution
A probability distribution is just like a histogram except that the lumps and bumps have been smoothed out so that we see a nice smooth curve. However, like a frequency distribution, the area under this curve tells us something about the probability of a value occurring.
standard deviation
the standard deviation tells us about how well the mean represents the sample data. However, if we’re using the sample mean to estimate this parameter in the population, then we need to know how well it represents the value in the population, especially because samples from a population differ. If we were to take several samples from the same population, then each sample would have its own mean, and some of these sample means will be different. In this sample we calculate the average rating, known as the sample mean. the sample mean is different in the second sample than in the first.
sampling variation
samples vary because they contain different members of the population;
sampling distribution
A sampling distribution is the frequency distribution of sample means (or whatever parameter you’re trying to estimate) from the same population. The sampling distribution of the mean tells us about the behaviour of samples from the population, and you’ll notice that it is centred at the same value as the mean of the population (i.e., 3). Therefore, if we took the average of all sample means we’d get the value of the population mean. We can use the sampling distribution to tell us how representative a sample is of the population.
SE
The standard deviation of sample means is known as the standard error of the mean (SE) or standard error for short. As such, it is a measure of how representative of the population a sample mean is likely to be. A large standard error (relative to the sample mean) means that there is a lot of variability between the means of different samples and so the sample mean we have might not be representative of the population mean. A small standard error indicates that most sample means are similar to the population mean (i.e., our sample mean is likely to accurately reflect the population mean).
the assumption of normality with categorical predictors
Because we can’t know for sure what the shape of the sampling distribution is, researchers tend to look at the scores on the outcome variable (or the residuals) when assessing normality. When you have a categorical predictor variable you wouldn’t expect the overall distribution of the outcome (or residuals) to be normal.