Module 4 Flashcards
What do the values have to be between for Pearson’s Correlation coefficient?
-1 and +1
values f r close to this indicate a strong linear association
What does a r value close to 0 indicate?
- little linear association between variables
What are the hypothesis for Pearson’s Correlation?
H0: p=o (no linear association)
H1: p not equal to 0 (linear association
What p-value shows a significant linear correlation?
p<0.05
What are the assumptions of a Pearon’s correlation coefficient?
- linear association
What test is used if there is no linear association between two variables?
- Spearman’s (rank) correlation coefficient
- require association to be monotonic
Define Monotonic?
- always increasing or always decreasing (but doesn’t have to be at the same rate
Does a correlation between 2 variables mean there is a cause and effect relationship?
- no
- there may be an unobserved variable that can this
What does correlation measure?
- magnitude of the association between 2 variables
What does regression measure?
- magnitude of dependence of one variable upon another
What is the idea of linear regression?
- find relationship between the independent (x) and dependent (Y) variable
- want to determine the straight line that best ‘fits’ the data
Can you have more than one independent variable for regression?
- yes
What is the linear regression model formula?
Yi=Bo+B1 Xi + Ei
What are the three main steps in regression analysis?
- estimate equation (find coefficients)
- assess model (significance and assumptions)
- use good model to make predictions
In Rcomander what is the Bo and B1?
- Bo is the (intercept) under estimate Std.
- B1 is the value under this
What is b1?
- regession coefficient (slope of line)
What is Bo?
- y-intercept
- the value of Y when X=0
What are the assumptions for regression?
- Y and X are linearly related
- the values of Y are independent from each other
- the random part of Y (error) is normally distributed around 0 with constant variance
What is the residual?
- is the difference between what our model predicts at a given value of x and what we observe
What are the assumptions for residual analysis?
- normally distributed
- mean of zero
- constant variance (homoscedasticity)
What do you do is the variance is not equal for residual analysis?
- transform data (Ynew=log(Yi)
- use different methods (weighted least squares regression)
What are the two types of prediction?
- interpolation (predict Y value using X values within data range)
- extrapolation (predict Y values using X value beyond sample data)
What is a simple linear regression?
- one dependent variable and one independent variable
What is linear regression?
- one dependent variable and 2 or more independent variables
What is the formula for MLR?
Yi= B0+B1 Xi,1 + B2 X1,2 + ……
How do you asses a MLR?
- look at each (y,x) bivariate pair
What are rhe main steps in MLR analysis?
- estimate regression equation
- asses the model and test hypothesis (ANOVA F test, Model validitity, explanatory power/adjusred R^2, multicollinearity, parsimony)
- test hypotheses regarding individual Xs
- if model is ‘good’, use to predict value of dependent variable
Why do we use an adjusted R^2?
- the non-adjusted becomes increasingly biased with increasing number of X’s
What is an example of a partial regression coefficient hypotheses?
- H0: all partial regression coefficients are zero
- H1: at least one partial regression coefficient is not equal to zero
What is parsimony?
- principle of explaining the most variation with the leas number of variables
What is information criteria (IC)?
- statistics that consider both parsimony and explanatory power together
- AIC (akaike IC)
- BIC (bayesian IC)
What is the basic formula for IC?
IC = lack of fit (= observed y - predicted Y) + penalty (num of parameters)
What is multicollinearity?
- occurs when independent variables are not independent
How can you identify mylticollinearity on a correlation matrix?
- a high number compared to others between 2 variables
How can you calculate the magnitude of multicollinearity?
- Variance inflation factor (VIF)
What does VIF indicate?
- increase in B variance due to presence of other collinear variables in model
- VIF< 5 is ok
What is confounding?
- variables that changes the effect (slope) of an explanatory variable when it is added to the model
What is the minimum sample size?
- minimum number of people needed to decalre clinically important effects that are also statistically significant
What is power?
- the probability of declaring an effect statistically significant when it is true
- larger sample size increases its power
What is the ethical principle?
- Inadequate sample sizes (too large or too small) to answer the posed question leads to wasted resources and, in clinical trials, unethical issues
What is alpha level?
p-value = probability of type 1 error = significance level p-value = 0.05
What is beta level?
power = 1-prob of type 2 error
power = 0.8 or 0.9
type 2 error (0.2, or 0.1)
What are the 4 types of effect size models for sample size?
- pilot study
- scientific literature
- expert suggestion
- wild guess
What are the three variables for sample size?
- expected difference
- power
- sample size
What is the difference between correlation and agreement?
- correlation means a consistent ratio
- agreement means the numbers are the same
What is used to measure agreement for continuous data?
Bland-Altman plots
What is bias?
- a systematic difference
What is used to measure agreement for categorical data?
- cross tabulation
- if perfect agreement the off diagonal would be 0
What is sensitivity/positive predicitive value?
- proportion of true positives correctly classified
What is specificity/ negative predictive value?
- proportion of true negatives correctly classified
What can be used to test a questionares reliability?
- Cronbach’s alpha
What are the rule of thumb for ranges with cronbach’s alpha?
- 0-0.7 = unreliable
- 0.7-0.8 = adequate
- 0.8-0.95 = good
- 0.95-1.0 = too similar