- bell curved - peak is its mean - mean median mode same value - centring; changing mean, shifting curve left/right - SD determines steepness of curve - scaling; changing SD - 68.2% of data within +/- 1 SD of mean - 95.4% of data within +/- 2 SD of mean - 99.7% of data within +/- 3 SD of mean

- singel numbers that are best guesses about corresponding population parameters - central tendency, measures of spread - relationships between variables can be expressed using point estimates

- communicate uncertainty around point estimate - indicates how confident can be that estimate is representative of population parameter

- when don't know sampling distribution - symmetrical and centred around 0 - shape changes based on degrees of freedom - 'fat tailed' when df=1; identical to normal dist. when df=infinite - as df increases, tails get thinner - critical value changes based on df - df= N-1 (n is number of estimated parameters)

- width of interval tell us about how much we expect mean of different sample of same size to vary from one we got - x% chance that any x% CI contains true population mean - can be calculated for any point estimate

- statement about something in terms of differences or relationships between things/people/groups - must be testable - about a single thing

- conceptual: expressed in normal language on level of concepts/constructs - operational: restates conceptual hypothesis in terms of how constructs are measured in given study - statistical: translates operational hypothesis into language of mathematics

- process of defining variables in terms of how they are measured - intelligence as total score on Ravens progressive matrics

Analysing data Flashcards by Corinna Smith

Greek symbols

population mean: µ
sample mean: ̅x
population mean estimate: μ ̂
SD: o-

How well did you know this?

Not at all

Perfectly

normal distribution

bell curved
peak is its mean
mean median mode same value
centring; changing mean, shifting curve left/right
SD determines steepness of curve
scaling; changing SD
68.2% of data within +/- 1 SD of mean
95.4% of data within +/- 2 SD of mean
99.7% of data within +/- 3 SD of mean

How well did you know this?

Not at all

Perfectly

critical values

if sd is known, can calculate critical value fro any proportion of normally distributed data

How well did you know this?

Not at all

Perfectly

sampling from distributions

collecting data on variable includes randomly sampling from distribution
underlying distribution assumed to be normal
some variables may come from other distributions; log normal distribution, poisson distribution, binomial distribution
sample statistic differ from pop
sampling distribution centred around population mean

How well did you know this?

Not at all

Perfectly

standard error

standard deviation of sampling distribution
estimated from any sample
SE = SD/ square root of N
gauge accuracy of parameter estimate in sample
smaller SE, more likely parameter estimate is close to population parameter

How well did you know this?

Not at all

Perfectly

central limit theorem

sampling distribution of mean is approximately normal, true no matter shape of population distribution
as N gets larger, sampling distribution of sample mean tends towards normal distribution
mean = µ, SD= SD/square root of N

How well did you know this?

Not at all

Perfectly

point estimates

singel numbers that are best guesses about corresponding population parameters
central tendency, measures of spread
relationships between variables can be expressed using point estimates

How well did you know this?

Not at all

Perfectly

what does SE of mean express?

uncertainty about relationship between sample and population mean
sample mean is best estimate of population mean, true for all point estimates

How well did you know this?

Not at all

Perfectly

interval estimates

communicate uncertainty around point estimate

- indicates how confident can be that estimate is representative of population parameter

How well did you know this?

Not at all

Perfectly

confidence interval (CI)

using SE and sampling distribution to calculate CI with certain coverage
95% CI, 95% of intervals around sample estimate will contain value of population parameter
95% of sampl. distr. within +/- 1.96 SE, 95% CI estimate pop. mean is mean +/- SE

How well did you know this?

Not at all

Perfectly

t-distribution

when don’t know sampling distribution
symmetrical and centred around 0
shape changes based on degrees of freedom
‘fat tailed’ when df=1; identical to normal dist. when df=infinite
as df increases, tails get thinner
critical value changes based on df
df= N-1 (n is number of estimated parameters)

How well did you know this?

Not at all

Perfectly

what you need to calculate confidence intervals

estimated mean
sample SD
N
critical value fro t-distribution with df = N -1

-95% CI around estimated pop. mean is mean +/- SE

How well did you know this?

Not at all

Perfectly

CI’s are useful :

width of interval tell us about how much we expect mean of different sample of same size to vary from one we got
x% chance that any x% CI contains true population mean
can be calculated for any point estimate

How well did you know this?

Not at all

Perfectly

hypothesis

statement about something in terms of differences or relationships between things/people/groups
must be testable
about a single thing

How well did you know this?

Not at all

Perfectly

levels of hypotheses

conceptual: expressed in normal language on level of concepts/constructs
operational: restates conceptual hypothesis in terms of how constructs are measured in given study
statistical: translates operational hypothesis into language of mathematics

How well did you know this?

Not at all

Perfectly

operationalisation

process of defining variables in terms of how they are measured
intelligence as total score on Ravens progressive matrics

Statistical hypothesis

operational hypothesis in terms of language of maths
deals with specific values of population parameters
mean of population can be hypothesised to be of given value
can hypothesise a difference in means between two populations

problems with samples that test hypothesis

not representative of population
larger the sample the better as fluctuations become less important as N increases
means converge to true value of population mean as N increases
CIs get exponentially smaller with N

null hypothesis

states there is no difference

used to test for statistical significance

distribution of test statistic under Ho

even if true difference in population delta is zero, D can be non-zero in sample
Assume A is normally distributed in population with µ=0 and o- = 1, expected value of D under Ho, more often than not D will not equal to 0 in sample

what is a p-value

the probability of getting test statistic at least as extreme as one observed if null hypothesis is true, how likely data is if there is no difference/effect in population
if p-value is less than chosen significance level, call result statistically significant

retain or reject null

reject null hypothesis when judge our result to be unlikely under Ho
retain Ho if judge result to be likely under it

continuous data

matter of degree eg how much
score or measurement
makes sense to have mean value

categorical data

matter of membership eg which group?
group or label
membership is binary

for each statistical analysis we need:

data test statistic distribution of test statistic probability of value of test statistic uder null hypothesis

correlation

- quantifies degree and direction of numeric relationship - used wtih two or more continuous variables or if one is categorical - use pearson correlation coefficient - only use correlated when reporting r as evidence

what code in r is used to get pearsons correlation

data %>% select(variable, variable) %>% cor(method = 'pearson')

what can you suggest when confidence intervals overlap

they may have same population value

chi squared test

- quantifies relationship between two or more categorical variables - compare what might expect under null and calculate X^2 to quantify - only use X^2 if value greater than 5 in each cell - the bigger the X^2 value the bigger the difference between our data and what we expect

important note about chi squared

only test significance of null hypothesis being true, there will be no evidence for alternative

using t distribution

- t is the difference in sample means compared to standard error of differences in means - larger the t the bigger the difference bewteen sample means compared to error

t and r

- p value from r comes from t distibution - can change t into r - t quantifies difference in means between two groups - R quantifies degree and direction of relationship between two variables

what is a predicor

variable that may have relationship with outcome

what is an outcome

variable we want to explain | outcome = model + error

linear model

- creates linear model between outcome variable and predictor variable in dataset - look at lm() %>% summary() - R^2 is variance of variable A was explained by variable B - adjusted R^2 is if applied same model to population - R^2 and adjusted R^2 must be similar and big

r code for linear model

Lm(outcome ~ predictor, data = data)

equation of linear model

outcome = b0 + b1 x PREDICTOR1 + error

f statistic

F = (what model can explain)/(what cant explain) - ratio of variance explained relative to variance unexplained - ratio > 1 means model can explain more than it cant - associated p value of how likely to find F stat as large as observed if null is true

how to compare linear models

- compare R^2 and change in R^2 - compare f stat and its associated p-value - look at standardised versions of b1