Analysing data Flashcards
Greek symbols
population mean: µ
sample mean: ̅x
population mean estimate: μ ̂
SD: o-
normal distribution
- bell curved
- peak is its mean
- mean median mode same value
- centring; changing mean, shifting curve left/right
- SD determines steepness of curve
- scaling; changing SD
- 68.2% of data within +/- 1 SD of mean
- 95.4% of data within +/- 2 SD of mean
- 99.7% of data within +/- 3 SD of mean
critical values
if sd is known, can calculate critical value fro any proportion of normally distributed data
sampling from distributions
- collecting data on variable includes randomly sampling from distribution
- underlying distribution assumed to be normal
- some variables may come from other distributions; log normal distribution, poisson distribution, binomial distribution
- sample statistic differ from pop
- sampling distribution centred around population mean
standard error
standard deviation of sampling distribution
estimated from any sample
SE = SD/ square root of N
gauge accuracy of parameter estimate in sample
smaller SE, more likely parameter estimate is close to population parameter
central limit theorem
- sampling distribution of mean is approximately normal, true no matter shape of population distribution
- as N gets larger, sampling distribution of sample mean tends towards normal distribution
- mean = µ, SD= SD/square root of N
point estimates
- singel numbers that are best guesses about corresponding population parameters
- central tendency, measures of spread
- relationships between variables can be expressed using point estimates
what does SE of mean express?
- uncertainty about relationship between sample and population mean
- sample mean is best estimate of population mean, true for all point estimates
interval estimates
- communicate uncertainty around point estimate
- indicates how confident can be that estimate is representative of population parameter
confidence interval (CI)
- using SE and sampling distribution to calculate CI with certain coverage
- 95% CI, 95% of intervals around sample estimate will contain value of population parameter
- 95% of sampl. distr. within +/- 1.96 SE, 95% CI estimate pop. mean is mean +/- SE
t-distribution
- when don’t know sampling distribution
- symmetrical and centred around 0
- shape changes based on degrees of freedom
- ‘fat tailed’ when df=1; identical to normal dist. when df=infinite
- as df increases, tails get thinner
- critical value changes based on df
- df= N-1 (n is number of estimated parameters)
what you need to calculate confidence intervals
estimated mean
sample SD
N
critical value fro t-distribution with df = N -1
-95% CI around estimated pop. mean is mean +/- SE
CI’s are useful :
- width of interval tell us about how much we expect mean of different sample of same size to vary from one we got
- x% chance that any x% CI contains true population mean
- can be calculated for any point estimate
hypothesis
- statement about something in terms of differences or relationships between things/people/groups
- must be testable
- about a single thing
levels of hypotheses
- conceptual: expressed in normal language on level of concepts/constructs
- operational: restates conceptual hypothesis in terms of how constructs are measured in given study
- statistical: translates operational hypothesis into language of mathematics