Week 8 Flashcards
What is epidemiology?
A study aimed at studying determinants of disease, injury or dysfunction in populations
Epidemiology is another way of saying ____
Epidemiology is another way of saying risk
Risk in PT can be expressed in terms of _____
• Experiencing an adverse outcome
• Patients not improving with treatment
• Requiring more invasive or expensive subsequent
interventions in spite of treatment
Epidemiology generally uses observational designs with ___ variables
Epidemiology generally uses observational designs with dichotomous variables
What studies are intended to study risk factors?
Case-Control & Cohort Studies
Case-Control & Cohort Studies looks at the ____ between disease & exposure
Case-Control & Cohort Studies looks at the association (“cause”) between disease &
exposure
The IV and DV in case-control & cohort studies are what kind of variables?
Dichotomous
In case-control & cohort studies, there is ___ strength in thinking something is causal of the other
In case-control & cohort studies, there is less strength in thinking something is causal of the other
How are subjects in a cohort study selected?
Subjects selected based on
exposure or not
Is a cohort study usually prospective or retrospective?
Usually prospective, but
can be prospective or retrospective
Does a cohort study work for rare conditions?
Doesn’t work well for very
rare conditions
What does a cohort study examine?
Examine if there is a different
incidence of disease
How are subjects in a case control study selected?
Subjects selected based on
whether or not they have
disorder
Where should the controls of a case control be selected from?
Controls should be selected
from same population as Cases
What does a case-control study examine?
Examine if exposure is different between cases and control
What condition does a case control work especially well for?
Works especially well for very
rare conditions
What are the primary ways to quantify risk?
- Relative Risk (RR)
* Odds Ratios (OR)
What do the primary ways to quantify risk actually quantify?
Both quantify strength of association between “exposure” and “disease”
In what study is RR used and in what study is OR used?
- RR in Cohort studies
* OR in Case-control studies
What does it mean when an RR or OR = 1 ?
- = “null value”
* No association between an exposure and a disease
What does it mean when an RR or OR > 1?
- A positive association between an exposure and a disease
* The exposure is considered to be harmful
What does it mean when an RR or OR < 1?
- A negative association between an exposure and a disease
* The exposure is protective
RR is the ratio of ___ compared to ____
Incidence of disease among
exposed individuals compared to Incidence of disease among
unexposed individuals
Since OR is selected based on whether they have disease or not, so can’t determine rate of ___
Since OR is selected based on whether they have disease or not, so can’t determine rate of “incidence”
OR is the ratio of ___ compared to ____
Odds of exposure among cases (with disease) compared to Odds of exposure among controls (w/o disease)
The computation of OR is kinda like ___
The computation of OR is kinda like kappa
____ uses relationships (correlation) as a basis for prediction
Regression uses relationships (correlation) as a basis for prediction
What are the characteristics of a linear regression?
X and Y are correlated • X = independent variable (= predictor variable) • Y = dependent (or criterion) variable • We use X to predict Y • The value of Y depends on X • (Thats why Y is called the dependent variable)
What is the error from line/ residual in a regression line?
The distance between each data point and the line of best fit
Residuals are squared to eliminate ___ and penalize for ___
Residuals are squared to eliminate sign and penalize for worse errors
What is the line of best fit?
Line with least squared errors
Is regression a parametric or non parametric statistic?
Parametric
What are the assumptions of a linear regression analysis?
- Linear relationship = approximation of true line in population
- For every X there is a normal distribution of Y
• Sample data include random samplings from these distributions on Y - Homogeneity of variance
What is a way to test the assumptions of a linear regression?
Analysis of residuals by:
Plot Residuals on Y-axis, vs predicted values on x-axis
What assumption of linear regression does the analysis of residuals test the most?
Homogeneity of variance
What are you looking for in the analysis of residuals to test linear regression assumptions (assumptions are met)?
Looking for the residual’s distance between the predictive value and the actual value be symmetric and consistent throughout
What does the analysis f residuals graph look like when the assumptions of linear regression are not met?
- The graph starts to get wider the further it goes(data is further away from the line, the higher you go)
- Data is not symmetric
What happens if the linear regressions assumptions are not met?
Use a non linear regression
What are the thing that helps a researcher determine whether to retain or discard a data with an outlier?
• Due to peculiar circumstances?
• Can discard if error identified
• Generally not justified on statistical grounds
alone
What are the peculiar circumstances that have to be taken into consideration when determining whether to retain or discard a data?
- Measurement error
- Recording error
- Equipment malfunction
- Miscalculation
- Aberrant subject (should have been excluded)
What are the things that looks a the accuracy of prediction of the regression equation?
• Correlation coefficient (R)
Coefficient of determination (R2)
• ANOVA of Regression
What are the characteristics of a correlation coefficient as it relates to the accuracy of prediction?
- Rough indicator of goodness of fit for regression line
* Same as correlation coefficient (r)
What does the coefficient of determination represent?
Proportion of variance in Y scores that can be explained by X scores
What does the ANOVA of regression test?
Tests hypothesis that predictive relationship occurred by chance (Ho: b = 0)
What does it mean when b=0 in an ANOVA of regression?
If b (slope) = 0, line is horizontal = no relationship
What happens when p< than alpha in an ANOVA of regression?
If p < than alpha, reject the null and conclude the predictive relationship is
significant
How many predictors are in a simple linear regression model and how many are in a multiple linear regression model?
There is only 1 predictor in a simple model and there are multiple predictors in a multiple linear regression model
What are the assumptions of a multiple linear regression analysis?
- Linear relationship = approximation of true line in population
- For every X there is a normal distribution of Y
• Sample data include random samplings from these distributions on Y - Homogeneity of variance
- DV = continuous measure
Coefficient of determination is the square of ____
Coefficient of determination is the square of correlation coefficient
What is an adjusted R squared and what do you get punished for?
Chance corrected R2, get punished for having more predictor variables
What is the goal of a linear regression?
The more you can predict with fewer variables, the better
What is a regression coefficient?
- The value/slope in the linear equation
* The rate of change in Y for each unit change of X
What is a standardized beta weight helpful for?
Helpful to know relative contribution of each predictor
variable
Which will always be higher or the same, out of an R square or an adjusted R square?
The R square will always be higher than or equal to the adjusted R square
What is multicolinearity?
When the Xs in the model are substantially correlated with each other
What does multicolinearity create a problem with?
Creates problems with interpretations of b weights
What is the risk of the force entry of all possible predictors in a multiple regression method?
- Risk of multicolinearity (correlation between predictors)
- Risk of retaining non-contributing predictors
- Risk of more predictors than justified by sample size
How is the criteria in a stepwise procedure set?
Criteria set to retain or reject predictors
Which predictor is entered first in a stepwise procedure?
Predictor with highest partial correlation entered first
What does a stepwise procedure result in?
Should result in model with greatest parsimony and
least multicolinearity
What is a parsimony model?
A model that is the most predictive, with the least amount of variables
What is a simple correlation?
The overlap between 2 variables
What is a partial correlation?
The unique correlation between 2 variables
What is a forward stepwise regression method?
A method that starts with no predictors, then adds them, starting with the strongest
What is a backward stepwise regression method?
A method that starts with all predictors, then removes them, starting with the weakest
What is a stepwise stepwise regression method?
A method that starts with no predictors, then add,
but can also remove
What is the level of measurement for predictors/ IV in a stepwise multiple linear regression model?
- Most predictors are continuous scales
- Can also use dichotomous or ordinal scale predictors
- But not multicategory nominal (e.g. race)
A large number of predictors is needed in a stepwise multiple linear regression hence it requires ___
A large number of predictors in a regression requires a very large sample size
What is the rule of thumb for the predictors of a stepwise multiple linear regression model?
At least 10-15 subjects per predictor in model
What happens if there are too many or too few predictors in a stepwise multiple linear regression model?
Become susceptible to “model overfit” (chance associations, i.e. type 1 error).
What is a logistic regression?
When you are trying to predict a dichotomous variable
What is the DV level of measurement of a logistic regression?
Dichotomous
What is the predictor/ IV level of measurement of a logistic regression?
Continuous, ordinal, or dichotomous
What are the pros MANOVA?
• MANOVA gets around multiplicity problem (familywise alpha:
increased Type I error risk)
• MANOVA can be more powerful if DVs related
What are the cons MANOVA?
• “Combo DV” is not directly interpretable
• If statistically significant, then must follow up with post-hoc
ANOVAs
What is a factor analysis?
Method of simplifying & organizing large sets of variable into fewer abstract components
What is a path analysis?
Visual modeling of both direct & indirect relationships
Path analysis is an extension of ____
Path analysis is an extension of multiple regression
Compared to a multiple regression, a path analysis is more __ and ____
Compared to a multiple regression, a path analysis is more flexible and comprehensive
What can a path analysis analyze?
Can analyze both direct and indirect relationships between 1 or more exogenous variables (IVs) and 1 or more endogenous variables (DVs)
What is a hierarchical linear modeling also known as?
- Multilevel linear modeling
* Linear mixed modeling
A hierarchical linear modeling comes from what type of analysis?
The type of analysis where you have some variables nested within other variables (students nested in a classroom when studying schools)
A hierarchical linear modeling, has far fewer __ and is highly ___
A hierarchical linear modeling, has far fewer assumption and
highly flexible
What is the Number Needed to Treat (NNT)?
How many patients you have to provide treatment to in order to prevent one bad outcome
What is Control Event Rate (CER)?
Percent of patients in control group with bad outcome
What is Experimental Event Rate (EER)?
Percent of patients in experimental group with bad outcome
What is the equation for RR?
EER/CER