WEEK 2: Variance and Sampling Flashcards
Learning objectives:
- Understand and explain the difference between a standard deviation and a standard error
- Understand and explain a 95% confidence interval
- Understand and explain error bar graphs
- Understand and explain why a t value less than 1 means there is no effect
What is variance?
It is the variability of the data, how spread out the data is around a certain point.
How is variance calculated?
Calculated by determining how much each score differs from the mean average of the sample, squaring each value, then adding then all up and dividing by the number of scores
Squaring the values accounts for there being both negative and positive values
Dividing by n gives the variance in the sample (when using whole population)
Dividing by n-1 gives an estimate of variance in the population when working with a sample of a population
It is difficult to see how variance values relate to the measure you have (the dependent variable) so you take the square root in order to get back to where you started before squaring everything - this is standard deviation
What is standard deviation?
How is it calculated?
Standard deviation determines how data is dispersed around the mean in a comparative unit of measurement, it is the square root of variance
Gives a unit of measurement to determine outliers e.g. ±1SD, ±2SD, any data above 3SD either side of the mean is a statistical outlier
What are the aims of stats?
- To generalise from a sample
- If you could test an entire population there would be no need for inferential stats as any effects found would be found in the entire population
- We usually take a sample from a population and test it using stats to assess the probability that any effects found will also be found in the whole population
What is sampling error?
Sampling from a population introduces an element of error
We need to know how much error there is in the sample data (how much the data differs from that we would see in the entire population)
We estimate the amount of deviation between the population and the sample to get standard error
What is standard error?
The estimated amount of ‘deviation’ between the population and the sample, by using the SD of the sample as we do not have lots of samples from the population, allows us to compare effects against error
Its the standard deviation of the sampling distribution
Standard error of the mean = Sample SD / √n
What is sampling distribution?
So say we take several samples from a population and calculate their mean values e.g. how fast on average a person drives in mph. These mean values would be different in each sample.
We put these means values into a histogram and see if they’re normally distributed (form a bell curve). This is the sampling distribution.
To estimate the deviation we will find between our sample and the population we need to calculate the SD
The standard deviation of the sampling distribution…
So imagine the mean of each sample is just one data point, it becomes it own sort of larger experiment concentrated down - we calculate the standard deviation in exactly the same way we would for a normal experiment.
SO once again…
scores divided by n = mean
Then we see how far the scores are from the mean
Variance - deviation between the scores and the mean; squared, totalled = sum of squares
Divide sum of squares by n-1 = the mean square
Standard dev - square root of variance
Definitions for variance shit:
Variance - deviations between the scores and the mean, added up and squared (gives the sum of squares)
SD - square root of variance
Mean square - the sum of squares (variance added up and squared) divided by n-1
Standard error = Sample SD / √n
Effect vs Error
Effect is divided by the standard error (Effect/ Error), compares error to effect, establishes whether you have more effect than error
Effect
Effect is the mean difference between scores
Error (Revisited)
Standard error –> So the sample standard deviation divided by the square root of the number of observations
This shows how much our sample mean deviates from the population mean. Its the standard deviation of the sampling distribution of the mean
**Looking at the T-test output:
- -> t statistic = mean 1 - mean 2 (Effect) / standard error of the differences
- -> If error is larger than effect the t value will be less than one
- -> meaning the amount of deviation that would be expected between a sample and the population is larger than your effect (mean difference between scores)
Socrative answers
- What does SD represent?
The standardised amount difference between scores and the mean - Why do the graphs for SD vs SE look so different?
So SD is measuring the standardised amount of difference between scores and the mean within a SAMPLE, and SE is an estimation of how much the sample mean deviates from the POPULATION mean
Error bars and confidence intervals
Error bar graphs with 95% confidence intervals - has upper value, lower value and mean on the bars
When looking at 95% confidence intervals we assume that the data is normally distributed, and that 95% of the data will fall 2SD from the mean.
**95% CI and Standard error
So we can be 95% confident that the population mean will fall between the upper and lower boundaries
–> How much do the groups overlap?
small effect/ large error = large overlap
large effect/ small error = no overlap