Multivariate analyses Flashcards
Multivariable analysis
- used for data with one dependent outcome variable but more than one independent variable
- multivariable analysis determines the relative contributions of different causes to a single event or outcome
Multivariate analysis
-used for data with more than one dependent outcome variable as well as more than one independent variable
Multiple regression
-used if both the dependent and independent variables consist of continuous data
Logistic regression
-used if the dependent variable consists of dichotomous categorical data (two outcomes)
Cox proportional hazards model
-used if the dependent variable also includes a time factor (e.g survival curve)
Log-linear analysis
-if the dependent variable consists of nominal categorical data (ie more than two outcomes)
Analysis of variance (ANOVA)
-for analysis of continuous dependent variable with categorical independent variables use ANOVA
Analysis of covariance (ANCOVA)
-used if there are both categorical and continuous independent variables
Path analysis
- an extension of multiple regression
- examines situations in which there are several final dependent variables with ‘chains’ of influence ie. variable A influences variable B which in turn affects variable C
Cluster analysis
- a multivariate tool used to organise variables into relatively homogeneous groups or ‘clusters’
- involves the generation of a similarity matrix
- produces a dendrogram
Canonical correlation
- multivariate tool used to explore the relationship between two sets of variables
- involves the computation of eigenvalues
Discriminant function analysis
- a multivariate technique used to detect which of several variables best discriminates between two or more groups
- similar to the multivariate analysis of variance MANOVA
Factor analysis
- refers to a set of statistical methods used to detect underlying patterns in the relationships among a number of observed variables
- aims to identify whether the correlations between a set of multiple observed variables can be summarised in terms of a smaller number of underlying, latent, unobserved variables called ‘factors’
Two main types of factor analysis
- exploratory factor analysis
2. confirmatory factor analysis
Exploratory factor analysis
- used for the preliminary investigation of a set of multiple observed variables
- doesn’t make assumptions about the compositions of underlying latent variables or factors
Applications of exploratory factor analysis
- data reduction when multiple (over 25) variables have been measured
- classification of symptoms into meaningful concepts
- definition of subscales of new measures
Confirmatory factor analysis
- method for testing whether a specified factor structure remains valid with a new dataset
- primarily used for assessing the construct validity of questionnaires or tests.
Conducting a factor analysis
1. construct a correlation matrix 2 extraction 3. rotation 4. define the factors to be retained 5. labelling
Extraction in factor analysis
-common factor analysis and principal components analysis are most frequently used
Rotation
-involves measuring the eigenvalues
Eigenvalue
-the amount of total variance explained by each factor
Kaiser rule
-only factors with eigenvalues greater than 1 are retained
Scree plot
- plot the component numbers against eigenvalues
- choose the number that forms the elbow or bend before the plot levels off on the right side
Labelling
-there is a general cosensus that the variables with a factor loading greater than or equal to 0.40 are probably making a significant contribution to that factor in constrast to those with smaller factor loadings
Defining the factors to be retained
- factor analysis
- eigenvalues are used to work out which values to keep
Path analysis
- refers to causal modelling and prediction beyond simple regression
- independent variables are exogenous variables and dependent variables are endogenous
- arrows display presumed causal relations
Arrows in path analysis
- single headed arrow flows from a putative cause to the effect
- double headed curve arrow indicates mere correlation but no predictive causal links
Path coefficient
- used in path analysis
- indicates the direct effect of a variable assumed to be a cause on another variable assumed to be an effect
Stratification
-often used to control or analyse the effect of confounder variables
Stratum
-a sub-group within a sample often defined by the presence or absence of a variable of interest
Mantel-Haenszel procedure
-applies a method of weighting for each strum to produce a summary score to help create an adjusted RR
Standardisation
- another method of stratification used in large data sets for public health statistics to produce adjusted rates
- age is the most often standardised variable
- a hypothetical ‘world standard population’ is often used
Direct standardisation
- stratum specific rates from study sample are applied to the standard population
- summary score is produced from this data
Indirect standardisation
- stratum specific rates from the standard population are applied to the study sample
- this gives expected rates
- the expected rates are divided by the observed rates to arrive at standadrised rates e.g standardised mortality ratio
Key learning points
- stratification is useful only for known confounders
- adjustment can be applied to Odds ratio as well as RR
- multivariate techniques such as regression can also be used for analysing confounders