Populations and samples Flashcards
What it says on the tin
RECAP: What is the definition of standard deviation?
How dispersed scores are within a data set i.e. how well the mean represents the data
What is the Standard Error?
How well the sample mean represents the true population mean
What is the 95% Confidence Interval?
Range of scores constructed such that the true population mean will fall within this range in 95% of samples
OR
Range within which 95% of sample mean falls within 1.96 SEs of the population mean
What are parameters?
Population features - collection of ALL data of interest
What are Statistics?
Data from a subset of the population, and should be generalisable to the population when done properly
i.e. statistics ESTIMATE parameters, and we can use probability to work out likelihood that our inferences are true
What are the features of a Normal Distribution
MEAN = 0
SD = 1
i.e. the distribution is symmetrical around 0
Why are z-scores calculated?
Our data sets collected from our samples will never have a mean of 0 and SD of 1, so we calculate z-scores to standardise values and thus make them directly comparable to the general population
How are z-scores calculated?
z = (value - sample mean)/SD of sample
A student scores 55 in a verbal test, and 60 in numerical reasoning.
Class scores are normally distributed - verbal and numerical means are both 50, while SDs are 5 and 12 respectively.
We want to know how the student performed relative to everyone else. Calculate the z-scores to answer this question
Verbal z=1.00
Numerical z=0.83
So z is bigger in the verbal test i.e. would be further right on the standardised distribution so she did better on this test relative to everyone else, than on the numerical test (1SD above average compared to <1SD above average)
How can we use z-scores?
We can look up values in tables which illustrate probability for values between 0-z and z-infinity, i.e. larger and smaller portions of area under curve
What is meant by “Sampling Error”?
Deviation of selected sample from true population data - we can multiply up from samples to estimate parameters but we are still only getting an incomplete picture.
LAW OF LARGE NUMBERS - using larger samples gets sample mean closer to true population mean
How do we calculate Standard Error of the mean?
SE is the SD of the sample mean i.e. how close it is to the true mean.
We calculate many sample means with new samples, and by the law of large numbers the mean of all of these means should be equal to the true population mean - sample mean in samples above 30 are considered to have a normal distribution
Calculation is sample SD/square root of N (value gets closer to 0 as N increases)
What is the magnitude of the Standard Error determined by?
Sample size and SD of population from which sample selected
Define sampling bias
The weighting of a sample with an over-representation of one particular category of people
What are 3 ways in which we can control for participant variables when drawing samples?
1) Random allocation
2) Pre-testing (however may still miss some variables, and is also very time-consuming)
3) Representative allocation - make sure groups equally representative on several relevant variables, deciding which ones are going to be most important to balance for in the given circumstances of the research question and aims