Data distributions, z scores and p-values Flashcards
What are distributions of data and how are data distributions commonly visualised?
- distributions of data are the manner in which data for a particular variable is spread over its range
- data distributions are commonly visualised using a histogram
What does the shape of the histogram tell us?
- are there more common scores in our data set? (mode)
- Is there a range in which data is more concentrated
- Chances of randomly picking someone/something from the sample in a particular range (probability)
What shape is a normally distributed histogram and what does this show?
- Bell curve
- It has a peak in the middle and trails off either side
- Shows that there’s an average (e.g. height) and then fewer people with lower heights and higher heights
When can we find it hard to see normality in a histogram?
If we don’t have much data/ small sample size
What are examples of Non-normally distributed data?
- Positively Skewed data has a tail to the left (reaction time)
- Negatively Skewed data has a tail to the left
- Danger: mean is distorted by tails! (median is a good way to get around that)
what are features of bimodal data?
- Two distinct populations
- Two peaks
- The mean will be in the middle and not representative of the common score in the data
- Usually bimodal data indicated something has gone wrong with your experiment and you may actually have two populations
what is the normal distribution specified by? (equation)
- Mean and standard deviation
- N(u,o) commonly used to describe
- u = mean
- o = standard deviation
- N = normality
give features of normality distributed histograms
- Mean is the line down the centre of the curve
- Standard deviation is related to the width of the curve
- All are bell shaped
- symmetric about the centre
- Tails never reach zero (very close though)
- The area under the curve is always equal to 1
- Very close to 0 by the time it gets to 3 standard deviations away from the mean (e.g. mean +/- 3 standard deviations from mean)
Why do we need to test probability?
- A major goal for us in using statistics is to be able to use our data to test experimental hypotheses
- We can never be certain about the validity of our hypotheses so probabilities will be fundamentally important
- Underlies Null Hypothesis testing
What is probability?
- We can think of a probability as a measure of how likely it is that an uncertain event will occur
- Probabilities can be expressed as a percentage or proportion
- 0% = impossible
- 100% = certain
What is the equation to work out the probability of an event occurring P (event)
P(event) = no. of possible outcomes consistent with event/No. of possible outcomes
What is conditional probability and give an example
- Probability of an event given that something else is known/assumed, i.e. when given/assuming some other additional information
- E.g. I close my eyes and role 1 die. An honest observer tells me I have rolled a number <4. Before I open my eyes what’s the probability that I have rolled an even number?
No. possible outcomes = 3
Only 1 is even
So P(event) = 1/3
P(even <4) where the line means given
When you assume something the assumption is the same
Give an example of working out area under data distributions for uniform distribution (where outcomes are equally likely)
- You have a friend who always arrives to meet you somewhere in the range from 5 minutes before to 5 minutes after the agreed meet time. They are never earlier than 5 mins before and never later than 5 mins after and time they arrive is completely random within that range
- You keep track of arrival times for 100 meetings in a year
- Q. Based on your sample of data what is the probability that your friend is at least two minutes late?
- A. Friend was late <2 minutes 29 times out of 100 meetings so the probability will be 29/100
- Useful to express as a proportion:
- Divide each by 100 – total of all bars is 1
- If all the bars = 1 then if we look at the bars in our range of interest then the bar heights themselves as a proportion of 1 will give us our proportion
- Remember your friend is equally likely to arrive at any time in the range from 5 minutes before to 5 minutes after the agreed meeting time
- So the histogram should look uniform (it was just messed up by a smaller sample size
- Area of distribution tells us about the probability
What is a z score (including equation)?
- the z score is obtained by subtracting the population mean from x and then dividing by the standard deviation
- X-u shows how far away your score is from the mean
- Diving by u shows what proportion this is of the standard deviation
- E.g. for someone’s IQ 120-100/15 -1.33. This shows us that our score is 1.33 standard deviations higher than the mean
Give features of z-transformation in general
- will take any normally distributed data and convert to z score
- if you find out z for all scores you will get a normal distribution with mean 0 and standard deviation 1
- N(0,1)
- this means there is one standard normal distribution that all data will follow
- Therefore by matching data for this with values in a chart we can work out the area (which tells us the probability)