HRSS SEM 2 Flashcards
Cross-sectional study design
Observational or descriptive
Collects data from a population at 1 specific point in time
Groups determined by existing differences, not random allocation
Advantages of cross-sectional study design (5)
- Snapshot of a population at one time
- Can draw inferences from existing relationships or differences
- Large numbers of subjects
- Relatively inexpensive
- Can generate odds ratio, absolute/relative risk and prevalence
Disadvantages of cross-sectional study designs
- Results are static: no sequence of events
- Doesn’t randomly sample
- Can’t establish cause and effect relationship
Pearson’s correlation co-efficient
Measures linear relationship between 2 variables
P=0 suggests no linear relationship
Coefficients offer crude linear association and are unable to adjust for other variables
Regression modelling
Investigates if an association exists between variables of interest
Measures strength and direction of association between variables
Studies the form of relationships
How are continuous linear relationships examined
By linear or non-linear regression models
How are categorial outcomes examined
By logistic regression
Describe Linear Regression
Ho = no relationship between DV and IV
IV and DV must be continuous; IV can be continuous or categorical
Assumptions for linear regression
Linear relationship between DV and iV
Observation independently and randomly selected
Effects are additive
Homogeneity of variances
Residuals are independent + normally distributed
Absence of outliers and multi-collinearity
Describe skewness and kurtosis, mean/median/mode for a normally distributed variable
both = 0
- the further the value is from 0 the more likely it is that the variable isn’t normally distributed
mean median and mode should be equal for a normally distributed variable
Tests of normality
Tests of normality (Shapiro-Wilk) compare the shape of the sample distribution to the shape of a normally distributed curve
- non significant tests suggest distribution of sample isn’t sig different from normal distribution
- significant tests suggest distribution in question is sig different from a normal distribution
Multicollinearity
Refers to IV’s that are correlated with other IV’s
In presence of multi-collinearity, regression models may not give valid estimates of individual predictors
Variance inflation factor (VIF)
Measure of how much the variance of the estimated regression coefficient is inflated by the existence of correlation among IV’s in the model
VIF = 1 : no correlation among predictors
VIF >4 : warrants further investigation
VIF > 10: signs of serious multicollinearity
In a simple linear regression model if B > 0 …
Positive association between IV and DV
For each unit increase in IV, the DV would increase by (B) value units.
Building a regression model
If IV associated with outcome (DV) and no affected by multi-collinearity, then can build multivariable multiple linear regression model for DV
Fitted regression model presents regression coefficients representing adjusted associations between DV and IV; adjusted for each other
Interpreting a regression model
For each unit increase in IV, the estimated DV unit would increase by (B) value, after adjusting for other IV’s
May change association
R^2
% of variability explained by fitted model
Observational studies
Subjects observed in natural state; can be measure and tested by no intervention or treatment
Cohort longitudinal studies
Population of subjects identified by common link
Researcher can follow across time to see what happens
Useful for establishing natural Hx of a condition
Cohort can be divided at outset into subgroups of people whose experience is to be compared
Can identify those most likely to develop outcome
Internal comparison
1 cohort involved study
Sub classified and internal comparison done
External comparison
> 1 cohort in study for purpose of comparison
Strengths of cohort studies
Can give incidence rate and risk
Cause-Effect
Good when exposure is rare
Minimises selection and information bias
Weaknesses
Loss to follow up
Requires large sample
Ineffective for rare disease
Time-consuming and expensive
Chi Square Test (+ assumption)
Determines if there is a significant relationship between 2 categorial variables
Assumption: expected frequency in each cell >5
Fisher’s exact test
Used if Chi Square Test doesn’t meet assumption (i.e. cells have frequency < 5)
Independent samples t-test
Used to compare means of a normally distributed continuous variable for 2 groups
Odds Ratio
Measure of strength of association between exposure and outcome. Represents odds an outcome will occur given a particular exposure; compared to outcome occurring in absence of exposure
OR = 1: exposure doesn’t affect odds of outcome
OR = >1 : exposure associated with higher odds of outcome
OR = <1 : exposure associated with lower odds of outcome
Confidence Intervals and OR
If CI of OR contains 1 (null value of OR) the relationship is likely to be insignificant
If CI of OR doesn’t contain 1 the relationship is likely to be significant
Logistic Regression
A regression w an outcome variable that is categorical and IV that can be a mix of continuous/categorical
Predicts which of the possible events are going to happen given certain other information on IV
Identifies factors that determine whether an individual is likely to benefit from a certain type of rehab program/outcome
Logistic Regression Assumptions
Ration of cases to variables (enough responses in a given category)
Linearity in the logic (regression equation should have a linear relationship w the logic form of the outcome)
Absence of multicollinearity and outliers
Independence of results
Logistic Regression Models
Dichotomous outcome - Binary LR
Polychromous outcome - multinomial LR
Ordered outcome - ordinal logistic regression
What test to examine relationship between outcome variable and DV
Chi Square test
Fishers test
T-Test
LR - Omnibus Test
Asks if null model is an improvement
If p<0.05 then fitted model (new model) as a whole fits significantly better than a null model w/o a predictor
LR model summary
- 2 log likelihood suggests new model > null model as in a better fit
- Nagelkerke R^2 value shows how much variation is explained by new model
Goodness of fit test
Used to examine if estimated LR model fits sample data
P<0.05 = poor fit
p>0.05 = good fit
RCT
Individuals allocated at random to receive one of number of interventions (min 2)
= chance of allocation to each intervention
NOT determined by researched, NOT predictable
Random allocation
Eliminates bias
Allows researchers to make casual inferences - randomisation ensures any difference between groups is due to chance
Covariates are distributed across groups at baseline = unbiased distribution of confounders
Sources of bias in RCT
Selection bias - inadequate concealment of allocation/incorrect generation of randomisation sequence
Performance bias - inadequate blinding/masking
Detection bias
Attrition bias
Reporting bias
How are sample and effect size related
Sample size inversely associated with effect size
Intention to treat analysis
Compares treatment groups as originally allocated, irrespective of whether patients received or adhered to treatment protocol
Promotes external validity
Per protocol analysis
Compares treatment groups as originally allocated but includes only those patients who completed treatment protocol which compromises internal validity
Longitudinal Data Analysis
Assesses change in response variable over time, measures temporal patterns of response to Rx, identifies factors that influence change
- Mixed effects model - compares individual change over time
- Marginal mode (GEE) - compares population over time, evaluates interventions/informs public policy
Generalised Estimating Equations (GEE)
GEE an extension of general linear model of statistical regression for modelling clustered/correlated data
Offers robust estimate of standard errors to allow for clustering of observations
Produces consistent estimates of regression coefficients and their standard errors
Can deal with normal and non-normal outcome data
If GEE is significant, two interventions are significantly different
Assumptions of GEE
Assumptions of general linear model
Responses from known family of distribution with specified mean and variance where variance is function of mean
Mean is a linear function of predictors
A correlation structure for the responses must be specified
Any missing data are either missing completely at random or data is missing at random
Sources of variation
In LDA the unexplained variation is divided into components, making the ultimate error variance smaller
LDA decreases unexplained variability in response - provides better estimates of effect
Post estimation distribution of residuals
Insignificant p value supports normality of residuals, no significant difference between residuals