Midterm 2 Flashcards
What are the consequences of violating these assumption?
P-values will not be meaningful
Parameters may not be accurate
What is independence?
Knowing the error of one or a subset of datapts provides no knowledge of the error of any others
What are the 3 ways non-independence arises?
Heterogeneity in the dataset (ignoring natural subsets related to response
Replicate measurements per test subject incorrectly inflates dfresidual
Nested data( ignoring subs smoking or another hierarchy caused heterogeneity in data
What are 3 warning signs of non-independence in a study?
Too many data points
Indication of any kind of repeated measurements
Any implausible result
What is homogeneity of variance?
Assumption that the scatter around old he model is of equal magnitude throughout the fitted model
What is the ideal approach for homogeneity of variance?
Residuals spread equally above and below fitted line. Plot residuals~ fitted values
What is normality of error?
When the residuals are normally distributed
Which type of plot is a more of a precise way to visualize distribution normality??
QQ Norm. If the QQnorm plot fits a straight line between -2 and +2 then the data is normally distributed
What is the Shapiro-will test
It test for normality giving a specific p-value for the null hypothesis that the data is normally distributed
How do we fix non normality and inhomogeneity?
Transform the variables. Ex apply log, sqrt, or exp
If residuals~ frequency plot is a right hand tail how do we fix the problem
Transform the response variable with log, sqrt or 1/y.
If residual~frequency is a left hand tail, how do we fix the problem
Transform the variable with e^y
What is linearity/additivity
A linear relationship between the response and the explanatory variables
How is non linearity detected
Plot the residuals against each of the explanatory variable. If linear the plot for each variable should show an equal distribution of points above and below zero
What other strategies to fix non-linearity
Inclusion of variable interactions
Inclusion of higher powers of the explanatory variables
What is model criticism
Testing key assumption of general linear models
Be normally distributed with mean zero
Not systemically vary different values of he predicted response
Not systematically vary for different values of any of the explanatory variables
From a scientific understanding what is conflicted a best model
Fewest explanatory variables that yield model who small p-value
An accurate predictive model is considered a beat model of?
Highest r^2 without regard for number variables but avoid over-fitting
What are the three principles of model choice
Economy of variables
Considerations of mariginality
Multiplicity of p-values
What is economy of variables
The simpler the better
What is multiplicity of p values
If you calculate enough p values some models will be significant just be chance
What is considerations of marginality
The simplest terms have priority and inclusion of interaction terms requires the inclusion of their simpler parts
What is the goal of economy of variables
To identify the minimum adequate model
How to deal with the problem of multiplicity of p values
Reduce the p value cutoff
Use more specialized statistical tests
Reduce the number of explanatory variables by combining multiple terms into a single term
Focus, don’t fish
What is the importance of marginality
Hierarchies must be respected in model formulae