research skills 7 data distribution Flashcards
what is normal distribution ?
tendency of data to cluster around the mean - bell shaped curve
what are the characteristics of normal distribution ?
- x axis - continuous scale of measurements
- peak of curve = mean
- y axis - probability density
what is probability density ?
- all the population in the normal distribution is under the curve
- relationship between observations and their probability
how is the Z score calculated ?
X - mean / SD
X = value on x axis
use z score to find percentage on table
what is a population and a sample ?
A population is the huge collection of individuals or data points and a sample is a smaller group drawn from that population.
what are the types of sampling ?
Random – chosen entirely by chance
Systematic – selected at regular intervals
Stratified – divided into subgroups that share a characteristic first
Clustered – subgroup used for sample
Convenience – first volunteers through door, past a threshold
Quota – need a set number of each sub group
Purposive – relies on the judgment of the collector
Snowball – one chosen person, recruits the next person
what is sampling distribution
data comprised of the means of various different samples
what is the central limit theorem ?
The sampling distribution of the mean is approximately a normal distribution if the sample size is large enough.”
It says – that if the sample size is large enough then the sampling distribution of the mean is a normal distribution.
what is a confidence interval ?
a range of values, calculated from sample data, that is likely to contain the true population parameter (like the mean) with a certain level of confidence, typically 95%
Sample size = smaller sample size will have less accuracy and will be less representative, larger samples will give more accuracy, a large sample is considered to be 30 or more.
Variation = if the variation in the actual real population is high then the variation in the sample will also be high
how do calculate confidence interval ?
the number of observations (n)
the mean (X)
and the standard deviation (s)
Decide what Confidence Interval we want: 95% or 99% are common choices. Then find the “Z” value for that Confidence Interval
Plug in all the numbers to the equation
when do you use the t score
use instead of z value when calculating confidence interval
More suitable for sample sizes of under 30.
need degree of freedom to find out confidence level