Data distributions, z scores and p-values Flashcards

1
Q

What are distributions of data and how are data distributions commonly visualised?

A
  • distributions of data are the manner in which data for a particular variable is spread over its range
  • data distributions are commonly visualised using a histogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the shape of the histogram tell us?

A
  • are there more common scores in our data set? (mode)
  • Is there a range in which data is more concentrated
  • Chances of randomly picking someone/something from the sample in a particular range (probability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What shape is a normally distributed histogram and what does this show?

A
  • Bell curve
  • It has a peak in the middle and trails off either side
  • Shows that there’s an average (e.g. height) and then fewer people with lower heights and higher heights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When can we find it hard to see normality in a histogram?

A

If we don’t have much data/ small sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are examples of Non-normally distributed data?

A
  • Positively Skewed data has a tail to the left (reaction time)
  • Negatively Skewed data has a tail to the left
  • Danger: mean is distorted by tails! (median is a good way to get around that)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are features of bimodal data?

A
  • Two distinct populations
  • Two peaks
  • The mean will be in the middle and not representative of the common score in the data
  • Usually bimodal data indicated something has gone wrong with your experiment and you may actually have two populations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the normal distribution specified by? (equation)

A
  • Mean and standard deviation
  • N(u,o) commonly used to describe
  • u = mean
  • o = standard deviation
  • N = normality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

give features of normality distributed histograms

A
  • Mean is the line down the centre of the curve
  • Standard deviation is related to the width of the curve
  • All are bell shaped
  • symmetric about the centre
  • Tails never reach zero (very close though)
  • The area under the curve is always equal to 1
  • Very close to 0 by the time it gets to 3 standard deviations away from the mean (e.g. mean +/- 3 standard deviations from mean)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we need to test probability?

A
  • A major goal for us in using statistics is to be able to use our data to test experimental hypotheses
  • We can never be certain about the validity of our hypotheses so probabilities will be fundamentally important
  • Underlies Null Hypothesis testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is probability?

A
  • We can think of a probability as a measure of how likely it is that an uncertain event will occur
  • Probabilities can be expressed as a percentage or proportion
  • 0% = impossible
  • 100% = certain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the equation to work out the probability of an event occurring P (event)

A

P(event) = no. of possible outcomes consistent with event/No. of possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is conditional probability and give an example

A
  • Probability of an event given that something else is known/assumed, i.e. when given/assuming some other additional information
  • E.g. I close my eyes and role 1 die. An honest observer tells me I have rolled a number <4. Before I open my eyes what’s the probability that I have rolled an even number?
     No. possible outcomes = 3
     Only 1 is even
     So P(event) = 1/3
     P(even <4) where the line means given
     When you assume something the assumption is the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of working out area under data distributions for uniform distribution (where outcomes are equally likely)

A
  • You have a friend who always arrives to meet you somewhere in the range from 5 minutes before to 5 minutes after the agreed meet time. They are never earlier than 5 mins before and never later than 5 mins after and time they arrive is completely random within that range
  • You keep track of arrival times for 100 meetings in a year
  • Q. Based on your sample of data what is the probability that your friend is at least two minutes late?
  • A. Friend was late <2 minutes 29 times out of 100 meetings so the probability will be 29/100
  • Useful to express as a proportion:
  • Divide each by 100 – total of all bars is 1
  • If all the bars = 1 then if we look at the bars in our range of interest then the bar heights themselves as a proportion of 1 will give us our proportion
  • Remember your friend is equally likely to arrive at any time in the range from 5 minutes before to 5 minutes after the agreed meeting time
  • So the histogram should look uniform (it was just messed up by a smaller sample size
  • Area of distribution tells us about the probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a z score (including equation)?

A
  • the z score is obtained by subtracting the population mean from x and then dividing by the standard deviation
  • X-u shows how far away your score is from the mean
  • Diving by u shows what proportion this is of the standard deviation
  • E.g. for someone’s IQ 120-100/15 -1.33. This shows us that our score is 1.33 standard deviations higher than the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give features of z-transformation in general

A
  • will take any normally distributed data and convert to z score
  • if you find out z for all scores you will get a normal distribution with mean 0 and standard deviation 1
  • N(0,1)
  • this means there is one standard normal distribution that all data will follow
  • Therefore by matching data for this with values in a chart we can work out the area (which tells us the probability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three columns in the table of standard distribution used to work out the error?

A
  • z score
  • proportion below score
  • proportion above score
  • It doesn’t have any negative values but these correlate to the positive values: e.g. the score to the left of -2 would be the same as the score to the right of +2
17
Q

So how do you work out the probability of normally distributed data?

A
  • Work out z score
  • See where this corresponds on the table
  • This is your probability
18
Q

How do you work out a range in the middle?

A
  • Find z-scores for the two extreme values
  • Work out the p-values in the tables that are above and below the extreme values
  • Subtract these values from 1
  • You will be left with the p-value of the area in the middle