Discovering statistics Flashcards
what does SPINE of statistics stand fro
standard error parameter interval estimates null hypothesis testing estimation
general linear model
outcome = b0 + b1(predictor) + error
chi sqaured test
chisq.test(data$variable, data$variable, correct = FALSE)
for categorical or count data
spearman correlation
data %>% correlation::correlation(., method = “spearman”)
continuous data
what do the parts of GLM stand for
b0 = estimate value when predictor=0 b1 = represents difference in means if linear model has two categorical groups bn = estimate of parameter for predictor, direction/strength of effect, difference of means
least squared estimation
- when no predictors, we predict them outcome from intercept
- outcome = b0 + e
- b0 will be mean value of outcome in this scenario
- if given data estimate the mean
- rearrange equation -> error = outcome - b0
- square the error and plot
- keep estimating mean
- peak of graph is least squared error
standard error
- frequency distribution -> plot sample mean against frequency
- smaller sampling distribution means smaller SD but is called standard error
central limit theorm
majority of scores around mean
normal distribution
1.96 sd from mean contains 95% data
confidence intervals
express estimates as intervals such that we know population value lies in them
95% chance contains true pop parameter
interpreting parameter estimates
raw effect size is the beta estimate
standardised effect size fits model to raw data that are z -scores (expressed in standarised scores)
long run probability: parameters represent effects
relationships between variables
differences in means
long run probability: parameters reflect hypotheses
h0 : b = 0, b1 = b2
h1 : b =/= 0, b =/= b2
long run probability: test statistic
t= b/SEb
can work out how likely value if null true
value of t on x axis and probability on y
type 1 error
reject null when it is true
believe in effects that dont exist
type 2
accept null when its false
statistical power
probability of test avoiding type 2 error
problems with null hypothesis testing
- not tell importance of effect
- little evidence about null hypoth
- encourages all or nothing
- based on long run probability
problem with long run probability
p is relative frequency of observed test statistic relative to all test statistics from infinite no. of identical experiments with exact same priori sample size
type 1 error rate either 0 or 1
comparing sum of sqaures
- sum of squares represent total error
- only compare the totals when based on same number of scores
illusory truth effect
repetition increases perceived truthfulness
equally true for plausible and implausible statement
SSt
-total variability between mean and scores
-SSm + SSr
-each SSt has associated df
dfT = N-p (p = parameter, N = independent information)
SSr
- total residual/error variability
- error in model
- to get SSr we estimate using ‘two’ parameters
- dfR = N - P, so P is 2
SSm
- total model variability
- improvement due to model
- model rotation of null model
- null and estimated model are distinguished by b1
- dfM = dfT - dfR
mean squared error
- sum/total amount of squared errors depends on amount of information use to compute it
- can’t compare sums as based on different amounts of info
- MSr = SSr/df (average residual error)
- MSm = SSm/df (average model variability)