Analysing data Flashcards
Greek symbols
population mean: µ
sample mean: ̅x
population mean estimate: μ ̂
SD: o-
normal distribution
- bell curved
- peak is its mean
- mean median mode same value
- centring; changing mean, shifting curve left/right
- SD determines steepness of curve
- scaling; changing SD
- 68.2% of data within +/- 1 SD of mean
- 95.4% of data within +/- 2 SD of mean
- 99.7% of data within +/- 3 SD of mean
critical values
if sd is known, can calculate critical value fro any proportion of normally distributed data
sampling from distributions
- collecting data on variable includes randomly sampling from distribution
- underlying distribution assumed to be normal
- some variables may come from other distributions; log normal distribution, poisson distribution, binomial distribution
- sample statistic differ from pop
- sampling distribution centred around population mean
standard error
standard deviation of sampling distribution
estimated from any sample
SE = SD/ square root of N
gauge accuracy of parameter estimate in sample
smaller SE, more likely parameter estimate is close to population parameter
central limit theorem
- sampling distribution of mean is approximately normal, true no matter shape of population distribution
- as N gets larger, sampling distribution of sample mean tends towards normal distribution
- mean = µ, SD= SD/square root of N
point estimates
- singel numbers that are best guesses about corresponding population parameters
- central tendency, measures of spread
- relationships between variables can be expressed using point estimates
what does SE of mean express?
- uncertainty about relationship between sample and population mean
- sample mean is best estimate of population mean, true for all point estimates
interval estimates
- communicate uncertainty around point estimate
- indicates how confident can be that estimate is representative of population parameter
confidence interval (CI)
- using SE and sampling distribution to calculate CI with certain coverage
- 95% CI, 95% of intervals around sample estimate will contain value of population parameter
- 95% of sampl. distr. within +/- 1.96 SE, 95% CI estimate pop. mean is mean +/- SE
t-distribution
- when don’t know sampling distribution
- symmetrical and centred around 0
- shape changes based on degrees of freedom
- ‘fat tailed’ when df=1; identical to normal dist. when df=infinite
- as df increases, tails get thinner
- critical value changes based on df
- df= N-1 (n is number of estimated parameters)
what you need to calculate confidence intervals
estimated mean
sample SD
N
critical value fro t-distribution with df = N -1
-95% CI around estimated pop. mean is mean +/- SE
CI’s are useful :
- width of interval tell us about how much we expect mean of different sample of same size to vary from one we got
- x% chance that any x% CI contains true population mean
- can be calculated for any point estimate
hypothesis
- statement about something in terms of differences or relationships between things/people/groups
- must be testable
- about a single thing
levels of hypotheses
- conceptual: expressed in normal language on level of concepts/constructs
- operational: restates conceptual hypothesis in terms of how constructs are measured in given study
- statistical: translates operational hypothesis into language of mathematics
operationalisation
- process of defining variables in terms of how they are measured
- intelligence as total score on Ravens progressive matrics
Statistical hypothesis
- operational hypothesis in terms of language of maths
- deals with specific values of population parameters
- mean of population can be hypothesised to be of given value
- can hypothesise a difference in means between two populations
problems with samples that test hypothesis
not representative of population
larger the sample the better as fluctuations become less important as N increases
means converge to true value of population mean as N increases
CIs get exponentially smaller with N
null hypothesis
states there is no difference
used to test for statistical significance
distribution of test statistic under Ho
even if true difference in population delta is zero, D can be non-zero in sample
Assume A is normally distributed in population with µ=0 and o- = 1, expected value of D under Ho, more often than not D will not equal to 0 in sample
what is a p-value
- the probability of getting test statistic at least as extreme as one observed if null hypothesis is true, how likely data is if there is no difference/effect in population
- if p-value is less than chosen significance level, call result statistically significant
retain or reject null
reject null hypothesis when judge our result to be unlikely under Ho
retain Ho if judge result to be likely under it
continuous data
- matter of degree eg how much
- score or measurement
- makes sense to have mean value
categorical data
- matter of membership eg which group?
- group or label
- membership is binary