Stats only Flashcards
What is important to consider when looking at sample size?
- Size matters
- Sampling error can result if your sample is not large enough
- Trade off between size and time/cost
Factors in deciding sample size?
o Design
o Response rate
o Heterogeneity of population
What is a population parameter?
- a quantity that describes some characteristic of a population with respect to a specific variable
- E.g., population mean, population range etc.
- Not usually possible to calculate
- Might be given to you if available
What is a sample statistic?
– a quantity that describes some characteristic of a sample with respect to a specific variable
- E.g., sample mean, sample range etc.
- We can always calculate these from a sample
- Sample statistics provide an estimate of population parameters
Why is it important to summarise data?
- Data can be very complex and therefore it is useful to summarise it
- Allows for interpretation
What are measures of central tendency?
They provide an indication of a “typical” score in the data set
What is the mean?
o Provides and estimate of the average score in the data set
o Is affected by extreme data points
What is the median?
o Is insensitive to extreme scores in the data set
o Doesn’t reflect the shape of the scores e.g., doesn’t care how far away the extreme scores are
What is the mode?
o Easy to calculate from a histogram and easy to understand – the most common value
o Data might have more than 1 mode or no mode at all
What is the range?
o Difference between min and max scores
o Range doesn’t always change for distributions with different shapes
What is a deviation?
o The signed distance of a score from the mean
How to calculate simple variance?
o Calc mean o Calc deviations o Square deviations o Calc a slightly adjusted average squared deviation - You divide by n-1
What is the issue with simple variance?
potential issue is the units used, so if deviations are in hours, when squared the units would become hours squared which isn’t comprehendible
How to calculate sample standard deviation?
o Calc mean o Calc deviations o Square deviations o Calc sample variance o Take square root of sample variance – now back in comprehendible units
What is a histogram?
- Good way to inspect data o Can see if there’s any odd-looking scores o Can see the mode o Can see how spread out the scores are o Can see how the data is distributed
What is a box plot?
Seems to be plotted vertically instead of horizontal?
What is a scatter plot?
shows the relationship between variables
What is a data summary plot?
- Plot bar showing mean (categorical data) or line graph (numerical data)
- Plot error bars showing +/- 1s.d.
What is distribution of data?
the manner in which data for a particular variable is spread over its range is commonly referred to a its distribution
What is normally distributed data?
- Many naturally occurring variables are Normal
- E.g., height, IQ (not naturally occurring but has been defined as this)
- If we don’t have much data then the normality can be difficult to see in a histogram
- As sample size increases, the normality will emerge
What is non-normal data?
- Has a tail either to the right or left – skewed data
- Positively skewed = long tail to the right, peaks at the left
- Negatively skewed = long tail to left, peaks at the right
- E.g., reaction time – tends to be positively skewed
What is the danger of non-normal data?
- Danger – mean is distorted by the tails which are the more extreme values
What is the danger of bimodal data?
- Danger – mean is not representative
- Tends to suggest an issue with your experiment – more than one underlying population
What is bimodal data?
Data that has two modes
What is the normal distribution?
- Bell-shaped
- Symmetric about the centre
- Tails never reach 0 – go towards infinity
- The area under the centre is always equal to 1
- Very close to 0 by the time it gets to 3 SD from the mean – can use this to draw a rough idea of a normal distribution
What is probability?
– a measure of how likely it is that an uncertain event will occur
What is conditional probability?
- Probability of an event given that something else is known/assumed e.g., A|B
What is a z-score?
- Z measures how far away your sample is from the population mean in multiples of the SD
- If you were to find z-scores for all points on a normal distribution, you would find that it would form a normal distribution with mean 0 and SD 1 – N (0, 1)
- The area underneath a normal distribution above/below some variable value of x EQUALS the area underneath N (0, 1) above/below z
How do you obtain a z-score?
- Obtained by subtracting the population mean from x and then dividing by the population SD – (x-µ)/σ
What is a SND table and how do you use it?
- Table that provides values of areas underneath the SND in different ranges
- Find z-score (first column) then decide if you want the area above or below this score
- If z-score is negative, use the positive value in the table but be careful when choosing above or below because the scores will be flipped
o E.g., z-score = -2 and you want the area below. On table you will use z-score 2 but use the area above - If you have a range that is bounded e.g., 70
What is a sampling error?
Sampling error – the error associated with examining statistics calculated from a sample rather than the population
Why do sampling errors occur?
- It occurs because in our sample we don’t have all the members of the population
What does the magnitude of a sampling error depend on?
The sample size
- Bigger sample = big sampling error less likely
- Smaller sample = big sampling error more likely
How do we generate a sampling distribution?
- Take a sample (size N) from a population
- Calculate a sample statistic (e.g., mean, SD etc.)
- Add the new statistic to a frequency plot (a histogram) of the sample statistic
- Repeated the above 3 steps multiple times
What does the sampling distribution tell us?
- Tells us important info about how a statistic changes from sample to sample
- What is the mean value of the statistic over all samples?
- How variable is the statistic over all samples?
- What shape is the distribution of the statistic over all samples?
What are the properties of the sampling distribution of the mean (SDM)?
- Mean which is the same as the parent population
- SD is different to that of the parent population – find by calculating σ (of p pop)/√N (sample size)
- SD is called the standard error of the mean (s.e.m.) or standard error (s.e.)
- S.e.m. must be smaller than SD of the parent population because you are diving by something that is bigger than one
What is a parent population a distribution of?
Parent population is a distribution of individual scores x (e.g., from an individual person or thing)
What is SDM a distribution of?
SDM is a distribution of sample means for samples of size N drawn at random from the parent population
What is central limit theorem?
- Given a population with a mean and SD, the sampling distribution of the mean approaches a normal distribution with a mean and SD sigma/ square root N as N increases
- This is true regardless of the underlying distribution – so even if your population is not normal, the distribution of means sampled from it will be
How do you find a z-score for a SDM?
z-score = (x-µ)/(σ/√N)
What is a point estimate?
– a single value estimate of a population parameter e.g., sample mean
What is an interval estimate?
– a range of possible values of a population parameter e.g., confidence interval
What is a confidence interval?
– describes an interval (e.g., a range) of values for our population parameter, together with a specified level of confidence that the parameter is in that range
For a sample drawn at random from a normal population N (µ, σ) with known s.d. σ ,the 95% CI for the population mean is centred on the sample mean m and goes from?
m – (1.96 x σ/√N) to m + (1.96 x σ/√N)
What does a 95% confidence interval mean?
A 95% confidence level means that if we repeated our sampling many times and worked out a new CI each time centred on our new sample mean we would expect the population mean to be in the interval on 95% of those repeats
True or False, if centred on sample mean, there is a 95% chance that the population mean is also in the range and vice versa (if looking for a 95% confidence interval)?
TRUE
True or False, if centred on sample mean, there is a 5% chance that the population mean falls outside of this range and vice versa (for a 95% confidence interval)?
TRUE
What are the steps for null hypothesis testing?
- Formulate research hypothesis o Null hypothesis (H0) o Research hypothesis (H1) - Collect data - Evaluate inconsistency with H0 and data o How inconsistent are the data with H0? - Reject or fail to reject H0? - Interpret in context
True or false, If we were able to reject the null (H0) in favour of the research hypothesis (H1) then we can claim to have evidence for the research hypothesis?
True
True or false, If we fail to reject the null (H0) then we can claim to have evidence for the null hypothesis?
False
What do values of p > α suggest?
suggest not inconsistent with H0: fail to reject null
What do values of p > α suggest?
suggest not inconsistent with H0: fail to reject null
What do values of p < α suggest?
suggest inconsistent with H0: reject the null
What is the value of α in stats?
α = 0.05
What is the p-value?
p-value = the conditional probability associated with your sample statistic
How do you conduct a z-test?
- Use NHST framework
- Calculate inconsistency with mean by calculating the z-score, use the table to find the associated p-value and compare this to 0.05 to decide whether to reject or fail to reject the null hypothesis
When is a z-test used?
- To check if a sample mean that has been obtained is different from some population mean
What is a 1 tailed hypothesis that is right hand tailed?
- Something is better than the population
- H1: sample mean > population mean
- Looking for p-value above score
What is a 1 tailed hypothesis that is left hand tailed?
- Something is worse than the population
- H1: sample mean < population mean
- Looking for p-value below score
What is a two tailed hypothesis?
- Something is different than the population
- H1: sample mean =/= to population mean
- Looking for p value above and below score – have sample mean and then also find another value the same distance away from the population mean but on the other side. E.g., population mean = 67.5, sample mean = 70.7, the difference is 3.2 so the other value you should consider is 64.3 (z-score will be the same for the two)
- Conditional probability = 2 x p-value
When can you formulate a 1 tailed hypothesis?
- There is previous research
- You can predict the effect
What is a type I error? Why does it occur?
- Rejecting the null hypothesis when it was correct – occur due to sampling error
What is a type II error? Why does it occur?
- Failing to reject the null hypothesis when it was incorrect
- Arise due to a number of reasons such as a biased sample, an error in the experimental task, sample size was too small etc.
Why do we use α = 0.05?
- It is small so it is difficult to reject the null hypothesis but not so small that it is impossible to do so
- It is a compromise between type I and type II errors
How is a student’s t distribution similar to SND?
- Bell-shaped, symmetric, uni-modal
How is a student’s t distribution different to SND?
- Has a lower peak, higher tails, have more variance
When is a student’s t distribution used?
- When population s.d. is unknown
Does student’s t distribution include a variety of t tests?
yes
How do you find the t statistic?
T(m) = (m-µ) / (s/√N)
How do you find the estimated standard error?
(s/√N) – estimated standard error
When using t distribution table, what value should you use for v?
When using t table – t (v = N-1) – subtract 1 off of sample size
How do you find confidence intervals when population s.d. is unknown?
- For 95% of repeat sample mean m would be within:
o Some number c e.s.e.’s of µ
o (µ- (c x s/√N) to µ+ (c x s/√N)) - To find c:
o Find t value for 0.025% in one tail (or 0.05% for 2 tails)
How do you conduct a 1 sample t test?
- Same as a z test except:
- Work out e.s.e.
- Find t statistic
- Find if t stat is inconsistent with critical value for corresponding t(n) and significance level
- Reject or fail to reject H0
- Interpret in context
When do you use a 1 sample t test?
- Use to test whether sample mean you have is different from some given or hypothetical population mean