Stats only Flashcards
What is important to consider when looking at sample size?
- Size matters
- Sampling error can result if your sample is not large enough
- Trade off between size and time/cost
Factors in deciding sample size?
o Design
o Response rate
o Heterogeneity of population
What is a population parameter?
- a quantity that describes some characteristic of a population with respect to a specific variable
- E.g., population mean, population range etc.
- Not usually possible to calculate
- Might be given to you if available
What is a sample statistic?
– a quantity that describes some characteristic of a sample with respect to a specific variable
- E.g., sample mean, sample range etc.
- We can always calculate these from a sample
- Sample statistics provide an estimate of population parameters
Why is it important to summarise data?
- Data can be very complex and therefore it is useful to summarise it
- Allows for interpretation
What are measures of central tendency?
They provide an indication of a “typical” score in the data set
What is the mean?
o Provides and estimate of the average score in the data set
o Is affected by extreme data points
What is the median?
o Is insensitive to extreme scores in the data set
o Doesn’t reflect the shape of the scores e.g., doesn’t care how far away the extreme scores are
What is the mode?
o Easy to calculate from a histogram and easy to understand – the most common value
o Data might have more than 1 mode or no mode at all
What is the range?
o Difference between min and max scores
o Range doesn’t always change for distributions with different shapes
What is a deviation?
o The signed distance of a score from the mean
How to calculate simple variance?
o Calc mean o Calc deviations o Square deviations o Calc a slightly adjusted average squared deviation - You divide by n-1
What is the issue with simple variance?
potential issue is the units used, so if deviations are in hours, when squared the units would become hours squared which isn’t comprehendible
How to calculate sample standard deviation?
o Calc mean o Calc deviations o Square deviations o Calc sample variance o Take square root of sample variance – now back in comprehendible units
What is a histogram?
- Good way to inspect data o Can see if there’s any odd-looking scores o Can see the mode o Can see how spread out the scores are o Can see how the data is distributed
What is a box plot?
Seems to be plotted vertically instead of horizontal?
What is a scatter plot?
shows the relationship between variables
What is a data summary plot?
- Plot bar showing mean (categorical data) or line graph (numerical data)
- Plot error bars showing +/- 1s.d.
What is distribution of data?
the manner in which data for a particular variable is spread over its range is commonly referred to a its distribution
What is normally distributed data?
- Many naturally occurring variables are Normal
- E.g., height, IQ (not naturally occurring but has been defined as this)
- If we don’t have much data then the normality can be difficult to see in a histogram
- As sample size increases, the normality will emerge
What is non-normal data?
- Has a tail either to the right or left – skewed data
- Positively skewed = long tail to the right, peaks at the left
- Negatively skewed = long tail to left, peaks at the right
- E.g., reaction time – tends to be positively skewed
What is the danger of non-normal data?
- Danger – mean is distorted by the tails which are the more extreme values
What is the danger of bimodal data?
- Danger – mean is not representative
- Tends to suggest an issue with your experiment – more than one underlying population
What is bimodal data?
Data that has two modes
What is the normal distribution?
- Bell-shaped
- Symmetric about the centre
- Tails never reach 0 – go towards infinity
- The area under the centre is always equal to 1
- Very close to 0 by the time it gets to 3 SD from the mean – can use this to draw a rough idea of a normal distribution
What is probability?
– a measure of how likely it is that an uncertain event will occur
What is conditional probability?
- Probability of an event given that something else is known/assumed e.g., A|B
What is a z-score?
- Z measures how far away your sample is from the population mean in multiples of the SD
- If you were to find z-scores for all points on a normal distribution, you would find that it would form a normal distribution with mean 0 and SD 1 – N (0, 1)
- The area underneath a normal distribution above/below some variable value of x EQUALS the area underneath N (0, 1) above/below z
How do you obtain a z-score?
- Obtained by subtracting the population mean from x and then dividing by the population SD – (x-µ)/σ