Modelling Flashcards
mulitlevel modelling operates between 2 extremes
data are so highly correlated, n=1 per unit and data are completely uncorrelated
works by estimating which data needs ‘pooling’ or shrinkage - so reduces df to 1 for each unit
what are summary measures not good with?
unbalanced design
what is a binomial distribution?
two possible outcomes are equally likely
what is a binary response?
Y/N, absent/present
what does a general linear model assume about dist?
assumes that unexplained variation is normally distributed
what does generalised linear model assume about dist?
assumes that unexplained variation can follow some other known distribution
types of distributions and what they are
log- normal = if effects are multiplicative not additive
exponential = eg latency/survival, probability of evrything remains constant, waiting for event
weiball = survival with non-constant mortality
poission = random rare, discrete events
negative binomial= clustered discrete events
when is bootstrapping used?
when you want to estimate parameter of population and want to get estimate of CI
when is bootstrapping used? and what sample size?
when you want to estimate parameter of population and want to get estimate of CI for sample size larger than 50
what are contrasts?
allow for testing of pair-wise differences after ANOVA
forward multiple regression
start with no varialbes then add most sig and then next most sig
backwards multiple regression
start with all variables remove least sig, then so on
Stepwise multiple regression
start the same as forwards with no variables but at any time can remove non-sig terms
What measure do we use for the balance between fit of a model and no of parameters it measures?
Akaike information crieterion (AIK)
what is logistic regression? and what statistic does it use?
Characterised by a link response distribution (binomial) and a link function which transforms mean value to make it more linear
Uses z statisitc
what is deviance?
Deviance is a measure of goodness of fit of a generalized linear model.
Deviance in logistic regression should not be
residual deviance should not be 2x as large as df
what is a GLM?
The general linear model incorporates a number of different statistical models: ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The general linear model is a generalization of multiple linear regression model to the case of multiple predictors
GLM: error is Normal (mean = 0, sd = )
3 examples of a generalised linear model
linear regression, logistic regression and Poisson regression.
Generalized LM: error is… lots of possibilities
Skewed data because of two many zeros?
zero inflated data
what tests can you do when you have outliers?
parametric on ranked data, non parametric or permutation tests
what is survival anaylsis?
analysing expected duration of time unitl one or more events happen eg death
what is censoring
when the actual data point isnt known but you can set boundaries based on what it must have been
brackets in R () {} [] <>
() = using to bound an object during execution of a function {} = used to bound creation of a function [] = used to subscript an object <> = denotes greater or less than
what is data mining?
lots of candidate predictors but no strong theory to predict which should be important
2 general classes of cluster based anaylsis
supervised learning- know true identity of some clusters and use these to make predictive models when you dont know group memebership
unsupervised learning - dont know whats right or wrong so try and find natural clustering patterns in data