Stats and Modeling Flashcards
What is stats used for?
discern justifiable differences in quantitative patterns
What are some examples of stats tests?
Basic t-tests, ANOVA, Chi-squared (X2) are
useful
How many outcomes do Chi-squared tests measure?
Two
When are these basic stats tests too burdensome?
situations where multiple variables and their interactions are in question/confounding!
T-tests
This tests whether significant differences, not attributable to random chance, are present in continuous data
X2 Test
Test significance between groups of non-continuous data.
Infected vs. not infected, or vaccinated vs. non-vaccinated. Useful with 2 x 2 contingency tables.
Fisher’s Exact test
This tests for significance in much the same way as Chi-squared test does, but works with any size data set. It also gives an “exact” P-value, rather than an approximation; Applicable to 2x2 tables.
Multivariable regression
Multiple independent variables are regressed on a single dependent variable. All variables are considered together to discern correlations either individually or in combinations
Multivariate analysis (MVA)
Techniques allow more than two variables to be analyzed at once
What is not usually included under MVAs?
Multiple regression is not typically included under this heading, but can be thought of as a multivariate analysis
Two general types of MVA technique?
Analysis of dependence and analysis of interdependence
Analysis of dependence
Where one (or more) variables are dependent variables, to be explained or predicted by others; E.g.: Multiple regression, PLS, MDA
Analysis of interdependence
No variables thought of as “dependent”; Look at the relationships among variables, objects or cases; E.g. : cluster analysis, factor analysis
What is important to bring to statistician?
The question you are trying to answer
In most cases data analysis is done to test?
the association between two or more variables (bivariate)
In health research the aim is to establish the association between?
risk factor and disease or between therapy and outcome
Simple linear regression
Y = a + bX
Y
Dependent variable
a
intercept
b
slope/regression coefficient (change in Y with a one unit
change in X)
Multiple Linear Regression equation example
predictor value
Multiple linear regression model
Y=a+b1X1+b2X2+b3X3 + ..
What is the slope of the regression line mean?
Strength of correlation
Model
A set of theoretically and or evidence based associations between variables
What does data analysis test?
the extent to which
the proposed theoretical associations are observed in the data
What is model development based on?
Previous knowledge (previous research evidence about associations) and new and hypothetical ones.
What is model development about?
It is mostly about deciding which independent variables (risk factors) should be in the multivariate analysis.
What does SIR stand for?
Susceptible, infectious, recovered
Susceptible S(t)
Disease susceptible people in population
Infectious I(t)
Those capable of transmitting an infection to others
Recovered R(t)
Those that have been infected and recovered
Kermack-McKendrick Model Assumptions
Assumes a well-defined and closed and infective and this lasts the duration of the disease state. An infective person is instantly infective and this lasts the duration of the disease state. Homogenous population with no genetic variation or demographic differences.
What does a vaccine do to a SIR model?
Takes people from susceptible to recovered