Discovering statistics Flashcards

1
Q

what does SPINE of statistics stand fro

A
standard error
parameter
interval estimates
null hypothesis testing
estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

general linear model

A

outcome = b0 + b1(predictor) + error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

chi sqaured test

A

chisq.test(data$variable, data$variable, correct = FALSE)

for categorical or count data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

spearman correlation

A

data %>% correlation::correlation(., method = “spearman”)

continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what do the parts of GLM stand for

A
b0 = estimate value when predictor=0
b1 = represents difference in means if linear model has two categorical groups
bn = estimate of parameter for predictor, direction/strength of effect, difference of means
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

least squared estimation

A
  • when no predictors, we predict them outcome from intercept
  • outcome = b0 + e
  • b0 will be mean value of outcome in this scenario
  • if given data estimate the mean
  • rearrange equation -> error = outcome - b0
  • square the error and plot
  • keep estimating mean
  • peak of graph is least squared error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard error

A
  • frequency distribution -> plot sample mean against frequency
  • smaller sampling distribution means smaller SD but is called standard error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

central limit theorm

A

majority of scores around mean
normal distribution
1.96 sd from mean contains 95% data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

confidence intervals

A

express estimates as intervals such that we know population value lies in them
95% chance contains true pop parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

interpreting parameter estimates

A

raw effect size is the beta estimate

standardised effect size fits model to raw data that are z -scores (expressed in standarised scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

long run probability: parameters represent effects

A

relationships between variables

differences in means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

long run probability: parameters reflect hypotheses

A

h0 : b = 0, b1 = b2

h1 : b =/= 0, b =/= b2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

long run probability: test statistic

A

t= b/SEb
can work out how likely value if null true
value of t on x axis and probability on y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

type 1 error

A

reject null when it is true

believe in effects that dont exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

type 2

A

accept null when its false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

statistical power

A

probability of test avoiding type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

problems with null hypothesis testing

A
  • not tell importance of effect
  • little evidence about null hypoth
  • encourages all or nothing
  • based on long run probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

problem with long run probability

A

p is relative frequency of observed test statistic relative to all test statistics from infinite no. of identical experiments with exact same priori sample size
type 1 error rate either 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

comparing sum of sqaures

A
  • sum of squares represent total error

- only compare the totals when based on same number of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

illusory truth effect

A

repetition increases perceived truthfulness

equally true for plausible and implausible statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

SSt

A

-total variability between mean and scores
-SSm + SSr
-each SSt has associated df
dfT = N-p (p = parameter, N = independent information)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SSr

A
  • total residual/error variability
  • error in model
  • to get SSr we estimate using ‘two’ parameters
  • dfR = N - P, so P is 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SSm

A
  • total model variability
  • improvement due to model
  • model rotation of null model
  • null and estimated model are distinguished by b1
  • dfM = dfT - dfR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

mean squared error

A
  • sum/total amount of squared errors depends on amount of information use to compute it
  • can’t compare sums as based on different amounts of info
  • MSr = SSr/df (average residual error)
  • MSm = SSm/df (average model variability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

F statistic

A
  • testing fit
  • sig fit represents sig effect of experimental manipulation
  • if model results in better prediction than the mean then MSm > MSr
  • Anova(model_lm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

testing the model

A
  • R^2 proportion of variance accounted for by model
  • pearson correlation coefficient between observed and predicted scores^2
  • R^2 = SSm/SSr
  • adjusted R^2 estimate of R^2 in population
    broom: :glance(data_lm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

how to enter predictors

A
  • hierarchal (experimenter decides)
  • forced entry (all entered simultaneously)
  • stepwise (only used for exploratory analysis, predictors selected using semi partial correlation with outcome)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

influential case

A
  • outliers distort linear model and estimations of beta values
  • detect them in: graphs, standardised residual, cooks distance, DF beta statistics
  • ggplot::autoplot(data_lm, which = 4, …) + theme_minimal() gives estimate, std.error, p.value and removes outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

robust estimation

A
  • use of model as can’t remove outliers
  • robust::lmRob(outcome~predictor, data = data)
  • summary(lm_rob)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

key assumptions of linear model

A

linearity (relationship between predictor and outcome is linear) and additivity (combined effects of predictors)
spherical errors (pop model have homoscedastic errors and independent errors)
normality of errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

errors vs residuals

A
  • model errors refer to differences between predicted values and observed values of outcome variable in POP model
  • residuals refer to differences between predicted values and observed of outcome in SAMPLE model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

spherical errors

A
  • should be independent
  • pop error in prediction for one case should not be related to error in prediction for another case
  • errors should be homoscedastic
  • violation of assumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

homoscedasticity of errors

A

variance of pop errors should be consistent at different values of predicted variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

violation of assumption

A

b’s unbiased but not optimal

standard error incorrect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

robust procedures

A

boostrap -> standard errors derived empircally using resampling technique, designed for small samples, robust b, p, and ci
heteroskedasticity -> consistent SE, uses HC3 or HC4 methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

dummy coding

A
  • code control group with 0 and the other with 1
  • b for dummy variable is difference between means of two conditions
  • mean condition 1 = b0 + b1(0)
  • mean condition 2 - mean condition 1 = b1
  • dummy coding isn’t independent as used same p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

contrast coding model

A
  • outcome = b0 + b1(contrast 1) + b2(contrast 2)
  • b0 is value of control
  • b1 is difference between b1 and b0
  • b2 is difference between b2 and b0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

planned contrasts

A

variability explained by model, SSm, due to participants being assigned to diff groups
variability represents experimental manipulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what to consider when choosing contrasts

A
  • independent - to control for error 1, if group is singled out in contrast then it shouldn’t be used again
  • only contrast 2 chunks of variation
  • k-1, end up with one less contrast than no. groups
  • first contrast compare control to all experimental ones
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

rules of coding planned contrasts

A

1-groups coded with positive weights compared to groups coded negatively
2-sum of weights equal 0
3-if group not used code it as 0
4-initial weight assigned is equal to number of groups in opposite chunk
5-final weight = inital/no. groups with non 0 weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

post hoc tests

A

in absence of hypothesis compare all means
inflates type 1 error rate
use bonferroni to correct
modelbased::estimate_contrasts(data_lm. adjust = “bonferroni”)

42
Q

trend analysis

A

polynomial contrast
only ordered groups
contrast(data$predictor)

43
Q

comparing means

A
  • when know extraneous/confounding variable influences outcome so adjust for them
  • reduce error variance by explaining some of unexplained variance
  • gain greater insight into effects of predictor
44
Q

partitioning variance

A

total variance = explained by predictor + unexplained variance
unexplained variance overlapped by variance explained by predictor and covariate

45
Q

how do you get beta estimates

A

broom::tidy(data_lm, conf.int = TRUE)

46
Q

adjusting means using predicted b values from broom::tidy

covariate and contrasts

A
  • use dummy coding and mean of covariate
  • outcome = b0 + b1(contrast 1) + b2(contrast 2) + b3(covariate)
  • outcome = 1.7 + 2.2(contrast 1) + 1.7(contrast 2) + 0.4(covariate)
  • code contrast 1 as 0, and covariate as its mean, outcome = 2.9
  • repeat with contrast 2 coded as 0
47
Q

unadjusted model

A

no covariate
predicted values are raw group means
beta attached to contrast 1 is difference between means of individual conditions from within that contrast

48
Q

f statistic with multiple predictors

A

calculated for sums of squares
type1: default in R, each predictor evaluated taking into account previous predictors, order of predictor matters
type 3: each predictor evaluated taking into account all other predictors, order not matter

49
Q

code for type 3 sums

A

data_lm %>% car::Anova(., type = 3)

50
Q

bias in f-statistics: heterogeneity of regression

A
  • for sig of f-stat to be accurate we assume relationship between outcome and covariate is similar across groups
  • known as homogeneity
  • when assumption is met, f stat is assumed to follow f distribution and corresponding pvalue
51
Q

factorial design

A

2 or more predictors have been manipulated

52
Q

moderator

A

acts on relationship between predictor and outcome

outcome = b0 + b1(predictor) + b2(moderator) + b3(predictor x moderator)

53
Q

interaction term

A

predictor x moderator
if interaction term pvalue is significant you ignore all other rows
it its significant there is significant moderator effect
parameter estimate quantifies raw effect size of interaction term

in factorial designs, effect of moderator is stronger is certain categories of predictor

54
Q

fitting the model factorial design

A

afex::aov_4(outcome ~ predictor*moderator + (1|id), data = data) doesn’t show parameter estimates, diagnostic plots or robust methods, but afex_plot() plots the interaction

55
Q

why does pvalue not tell us anythign abou timportance?

A

it depends upon sample size

56
Q

what is value range for fstat

A

0-1, anything greater than 1 means the model explains more than it doesn’t

57
Q

what test is never used for normality

A

K-S test

58
Q

what does the assumption of normality primarily apply to?

A

sampling distribution of parameters

59
Q

what is an outlier

A

data point that is unrepresentative of relationship being investigated

60
Q

what do the f stat and its associated pvalue tell you

A
  • the ratio between variance explained by model and residual variance
  • whether the model explains variance in outcome better than the grand mean
  • likelihood of obtaining the value you have if no true difference in means of groups
61
Q

characteristics of orthogonal contrasts

A

hypothesis driven
control type 1 error rate
planned a priori

62
Q

how does f ratio chnage when using dummy, contrast or post hoc test

A

it doesnt as looks at model as whole

63
Q

why do you use an interaction term in model?

A

expecting the effect of one predictor to vary as function of another predictor

64
Q

what is orthogonal

A

independent contrasts that cross multiply = 0 and add together = 0

65
Q

what is main effect

A

effect of just one of the independent variables on dependent variable
effect f predictor alone ignoring all other predictors in model

66
Q

assumption of sphericity

A

automatically met when variable has only two levels

if not met it is remedied by adjusting degrees of freedom by the degree to which data are not spherical

67
Q

interpret effects of interaction term

A

a. The extent to which the type of variable A affected outcome depended on type of variable B and vice versa

68
Q

types of variance

A

systematic-created by our manipulation

unsystematic-created by unknown factors

69
Q

benefits of repeated measure design

A

more sensitive -unsystematic variance reduced, more sensitive to experiemntal effects
more economic-less participants
possibel fatigue effects

70
Q

repeated measures and linear model

A

all participants in all conditions, scores correlate
violates assumption of independent residuals
need to adjust model to estimate this dependency:
outcome = boj + bj(predictor) + ej
boj = bo +uoj
b1j = b1 + u1j
u is variability across different particpants

71
Q

approaches to repeated measure and GLM

A

assume sphericity: estimate and correct for it

fit multigrowth model

72
Q

what is sphericity

A

difference between pairs of groups should have equal variance
assumption the variances are the same between conditions
greenhouse geisser estimate
e=1 then perfect sphericity

73
Q

what to do for sphericity

A

r multiples df by e to correct for effect of psherciity
given that e quantifies deviation from perfect spherciity
df get smaller which makes harder to tests tat to be sign
routinely apply g-g correction

74
Q

repeated measure linear model

A

afex::aov_4(outcome ~ predictor + (predictor|id), data = data)
ges is effect size

75
Q

how to set contrasts for self using afex and emmeans

A

emmeans::emmeans(model_afx), ~predictor, model = “multivariate”)
data_cons

76
Q

robust model of repeated measures

A

WRS2::rmanova(y = data$predictor, groups = data$predictor. blocks = data$id)
gives f stat, df
use WRS2::rmmcp for robust post hoc

77
Q

simple effects analysis

A

emmeans::joint_tests(data_afx, “predictor b”)

effect of predictor a within predictor B

78
Q

post hoc test for repeated measures

A

pairs(int_emm, adjust = “holm”)

don’t do if get non-significant

79
Q

mixed design contrasts

A

categorical predictors must be coded as contrast variables

extract them using emmeans::contrasts()

80
Q

what are the group of methods that can be used in a mixed design

A

eff -> each category compared to average of all categories
pairwise -> each category compared to all others
poly -> polynomial contrasts
trt.vs.crtl -> compares each category to a declared reference category, ref = x
consec -> compares each level/category to the previous

81
Q

code to look at main effect

A

only important if interaction term is non-significant
emmeans::emmeans(data_Afx, ~predictor, model = “multivariate”)
look at each predictor separately: (data_afx, c(“predictor”,”predictor”), model = “multivariate”)

82
Q

how to adjust for sphericity

A

R multiples df by value of epsilon, which makes result mroe conservative

83
Q

what percent of variance in ‘festivity’ is explained by ‘film’

A

look at column ges

0.15 ges is 15%

84
Q

what element of model has largest effect size?

A

largest value of F

85
Q

why can’t categorical outcomes be a normal linear model?

A

violates assumption of linearity

86
Q

model for predicting probability of outcome

A

ln(P(Y)/1-P(Y)) = b0 +b1(X) + e

outcome is log odds of outcome occurring
b1 is change in log odds of outcome associated with unit change in predictor

87
Q

log and exponents

A

log of 1 = 0

exponent of 0 = 1

88
Q

odds ratio: b0

A

log odds of outcome when predictor is 0

easier to interpret tahn exp(B0)

89
Q

odds ratio: b1

A
  • change in log odds of outcome associated with unit change in predictor
  • easier to interpret exp(b1), odds ratio associated with unit change in predictor
  • OR >1; as predictor increases probability of outcome increases
  • OR <1; as predictor increases, probability of outcome decreases
90
Q

classification table

A

states number of type of ‘presents’ and how may were ‘delivered’ and ‘undelivered’

91
Q

odds(delivery)

A

number of delivered/number undelivered

92
Q

odds(delivered after treat1)

A

number of delivered after treat1/ number of undelivered treat2

93
Q

odds ratio

A

odds(delivered after treat2) / odds(delivered after treat1)

94
Q

how to interpret an odds ratio of 0.15

A

odds of delivery is much smaller for treat 2 than treat 1

0.15 times smaller

95
Q

fitting GLM to categorical outcome

A

glm(outcome ~ predictor, data = data, family = binomial())

data_glm %>% parameters::parameteres() %>% parameters::parameters_table(p_digits = 3)

96
Q

how can you convert the log odds of glm to exponentials

A

insert “exponentiate = TRUE” into parameters::parameters()

97
Q

two predictors in categorical outcome

A

ln(P(Y)/1-P(Y)) = b0 + b1(pred1) + b2(pred2) + b3(pred1 x pred2) + e

98
Q

things that can wrong with categorical outcome model

A
linearity
spherical residuals
multicollinearity
incomplete information
complete separation
99
Q

what is incomplete information

A

empty cells
inflates standard errors
problem escalates quickly with continuous predictors

100
Q

what is complete separation

A

outcome variable can be perfectly predicted

101
Q

how are log odds produced when from a glm

A
  • if produces a log odd of -1.03
  • create glm with subsetting the type of treat
  • 1.03 = log odd of treat 2 - log odd of treat 1
102
Q

planned contrasts and parameter estimates

A

-control_vs_exp