Linear Mixed Models Flashcards
What is a outlier?
An observation that lies outside the overall pattern of the distribution
what are the 5 most common causes of outliers?
Data entry or processing error
Measurement error
Experimental/Intentional Error
Sampling Error
No error at all, just a novelty in the data (think Malcolm Gladwell)
What are the 3 methods too define a outliner? pros and cons of each
Z-score (i.e. two standard deviations)
Pro: easy implementation, effective if normal
Con: can eliminate natural tail scores
Inter-quartile range (IQR) method
Pro: easy implementation, more robust to slight deviations in normalcy
Con: only data between 25% and 75%
Cluster Analysis
In k-means outlier detection, the data are partitioned into groups by assigning them to the closest cluster
______ can occur as a result of an outlier or experimental mortality
Missing data
How you deal with an outlier is largely influenced by the nature of the data, type of experimental design and statistical analysis?
Is missingness random or systematic?
Cost-benefit
Cost of running another participant?
Missing one data point in a two-year longitudinal study
Availability of participants
What are the 4 solutions to missing data?
Discard data
Remember, in a repeated measures design discarding one observation leads to exclusion of all observations for that subject.
Imputation (i.e. replace missing data with substitute value)
- replace missing data point with mean of observed variable
- Last value carryforward
- Use information from related observations
- Estimation based upon individual and effect means
- Use information from related observations (ie, mother income for dad)
Replace data
- Most common for a between groups design without repeated measures
For data with more than two repeated measures use an analysis that models subject data as a random factor
- mixed models
- Based upon regression, each subject is treated as a random factor. Establishes a linear trend over data that estimates any missing data points
How are ANOVA and regression the same?
ANOVA reports each mean and a p-value that says at least two means are different
Regression reports only one mean (the reference category) and the difference between the reference category and all other means
What is a mixed model?
Mixed Model
- A statistical model that contains both fixed effects and random effects
Fixed Effect
- All levels of the effect are sampled
e.g. the effect of an independent variable — all levels of the independent variable are measured
- *Random Effect**
- Only a random samples of levels is
acquired
What is the equation for a mixed model?
Outcome = Intercept + (fixed effects)+ (random effects) + (error)
What are the 5 advantages of linear mixed models over ANOVAs w/ repeated measures?
- Missing Data
- Anova all data dropped
- LMM only that time point dropped - Post hoc-test
- Anova, cant run
- LMM can run - Flexibility w/ time
- ANOVA time is category
- LMM can be continious - Easier to build more complex models that account for quantifiable sources of error
- ANova , Can include covariates to attempt to account for the effect of an extraneous variable
- LMM can directly model the effect of additional variables to explain their effect on the dependent variable - Differing number of repeats/Different time points
- ANova, time categorical, all categories presents an all data points must be mesured at same time
- LMM, data can be measured at different points since time continious
When is ANOVA better then LMM?
Simple models
A pre-post test design with only two levels of the factor time
A mixed measures ANOVA with two groups and two levels of time
ANOVA is typically a simple model–>always run the simplest model that gives the most accurate result