Lecture 13 - Regression Flashcards
(!) Describe different test types & some assumptions
Assumptions:
- Homogeneity of variance between groups
- Independent obs.
- Covariate unrelated to experimental treatment: ANCOVA
____________
Test-type:
Independent t-test:
- 1 binary IV
- Both DV & IV continuous
- Between design
Paired sample t-test:
- 1 binary IV
- Both DV & IV continuous
- Within design
ONE-WAY ANOVA
- 1 categorical IV factors
- Continuous DV
N-WAY ANOVA:
- More than 1 categorical IV factors
- Continuous DV
ANCOVA:
- 1 or more IV
- Give purified mean
- C = Covariance: Relation between variables
- Almost read like ANOVA after adjustment
- Variables should be independent of treatment
- Random selected group assumed equal in all perspectives
F-test:
- Compare variance while ANOVA compare means
- Only tell if mean differ, not how: Thus often mix w. ANOVA
(!) Describe multiple regression & what it test
General:
- How DV mean change as function of IV´s: Partial effect of each IV
- More variables change calculation: Correlate w. each other
- Every part controlled for & held constant
- Can model non-lineary relationship
Test:
- If relationship between IV & DV is significant: Non-zero
- Magnitude of relationship between IV & DV
- Direction of relationship: Positive or negative
Dont test:
- Causality: Inferred in RD
(?) Describe the elements of multiple regression models
Y = DV
Beta_0 = Intercept: DV value when all IV´s are zero
Beta_1*X_1 = beta coefficient = IV. Change in Y with respect to change in X_x
u = Error term: Factors other than X_k influencing Y
i = obs. number
k = number of IV
n = number of obs.
Apostrof = Estimated
What does linear regression model?
Conditional mean of DV: Linear combination of IV
Describe centering & standardization
Centering:
- Subtract mean of variable
- Meaningful intercept
- Easier interpretation of interactions
- Don’t do for binary variables
- Same interpretation/conclusion of coefficients
- Set to 0
Standardization:
- Subtract mean & divide by std.dev
- Standard variables: Mean: 0, Std. dev.: 1
- Comparable: Despite different scales
- Interpretation/conclusion changes
- SPSS calculate standardized coefficients for you: No reason to change raw scores
(!) Describe methods of choosing predictors
Hierarchical entry:
- Building up or down: Enter or remove
- Entire block of variables in gradual steps
- First block is variables from previous research: Controls
__________
Forced entry:
- Recommended
- All variables entered simultaneously: Decide which one stays
___________
Stepwise methods: SPSS chooses for you
General:
- Not recommended
- You > SPSS: Dont consider theory, previous research & hypotheses tested
- Controls & basic effects –> Variables in hypotheses –> Interactions & mediations
Forward:
- Start with intercept only
- Choose variable with highest simple correlation with DV
- Choose variable best explaining remaining variance
Stepwise:
- Same as forward, but also remove & add useful/not useful variables
Backward:
- Same as forward but remove variables gradually based on p-values or t-statistics
(!) Explain goodness of fit
General:
- How much actual DV variance model explain
- SST = SSE + SSR
___________
Terms:
Total sum of squares / SST:
- Total variance of DV
Explained sum of squares / SSE
- Total variance explained by model
Sum of squared residuals / SSR
- Unexplained left over variance
- Can be inflated: Dont increase by added IV despite insignificant
R^2
- Show how much variance model explain
- Between 0 & 1
- Higher = Better
Adjusted R^2
- Better than normal R-squared
- Penalize R^2 for each IV added
- Show quality of model
- F-change = Significance of R-square change
- Show how much variance model explain
- Tell whether added variable increase explanatory power of model
(!) Describe control variables & how to handle them
General:
- IV´s not of interest
- Explain “left over” variance
Handling:
- Compare to previous research
- Explain reason for inclusion
- Transparency: Show effect with & without
- Quality check: Same as variables of interest
- Consider extreme correlations: Multi correlation
- Avoid using them to show result you want
(!) Describe statistical test types
General:
- Try model reality as close as poss.
- Test how much variance in DV model explain: Goodness of fit
- Significance test: Coefficient significantly different from 0?
- Summary of dataset to one number used in hypothesis test
- DV must be continuous
T-statistics:
- Compare 1 categorial IV to 1 continuous DV
- Compare means of two groups
- How good a predictor x is
- Increase by larger samples
- Variance of coeficient beta estimate
- Coefficient accuracy
- t = Estimated beta / Standard error of estimated beta
- Need reference point for judgement
- Regression
F-test:
- Compare 2 continuous variables: IV & DV
- Test if same variance
- Variance explained by model between group divided by error variance
- ANOVA
Chi-squared test:
- Compare two categorial variables: IV & DV
- Compare expected & observed result
- Test if variables are independent of each other
- Result of fitting function
- Would be 0 if reality
- Summarize process of fitting function
- Incl. degree of freedom
(!) Describe beta, error, slope, fitted value, intercept & critical values
Beta:
- Coefficient
- Show degree of change in DV for every 1-unit change in IV
- Apostrof if estimated: Not in general model
- We seek to reduce error in beta estimate
Error:
- Residual
- Space between real obs. & slope
- Always above line
- What is left after fitting a model
- Diff. between obs. & fitted values
Slope:
- Shows condicted values
Fitted value
- Under line
- Predicted value of y
Intercept:
- Value of DV when all IV = 0
- If all IV´s centered
Critical values:
- Can be found in the back of book
- Tell if the value is anormal given distribution
(!) How should different variables be inferred?
General
- Negative beta-coefficient: IV increase = DV decrease
- Positive beta coefficient: IV increase = DV increase
- G: -0,904 = Being female means -0,904 less engaged
- F: More frequent = More engaged
- A: More acknowledge = More acknowledge
Dummyvariable:
- Binary variables: Eg. Male = 0, woman = 1
- Show effect for 1 compared to 0
Categorial:
- 1 as baseline
- Three categories = Read two coefficients
- Compare to base group
- Eg. Single = 1, Married = 2
(!) Describe the basic assumptions for t-statistics
Assumptions for unbiased OLS estimation:
General:
- OSL = Ordinary least squares
- Technique to find linear regression
- Linearity:
- Scatterplot: Require retangular
- Relationship between X & mean of Y is linear
- Y-axis DV. X-axis: IV
- Variable can be logarithm or squared: Eg. Age
- Squared: Positive linearity to top point, negative after
- Eg. 10, 20, 30, 40 - Random sample:
- Sample must reflect population
- Set at RD - No perfect collinearity:
- No exact linear combination of IV´s: Multicollinearity
- No constant IV´s
- Min. as many obs. as parameters: n<k+1
- Eg. Allow age & age^2, not age in years & in months - Zero conditional mean of error:
- Scatterplot: Mirror
- Each IV error term expected to be 0
- Large sample can balance it out
- Violated if omitting important variables
- Incl. Control variable to hinder systematic variance
- Assumption important to minimize residuals & calculate betas
___________
Statistical significance
- If not fulfilled we cannot be sure of significance
- Homoscedasticity:
- Scatterplot: Require no cone-shape
- Variance of error term
- Error term of IV´s must have same variance: sigma^2
- Violated if distance to line become further
- DV linear combinations of IV´s
- Variance should not depend on changing IV-values
- Error must not autocorrelate - Normality:
- Sharpio Wilk: Significance required: Above 0,05. R: >0,01
- Include assumption 4 & 5
- Hardest to pass
- Can be loosened
- Helped by larger samples
- Helped by normal distribution on DV
- Unobserved error is normally distributed in population
- If not fulfilled –> Bootstrapping: Replace w. obs. data
(?) Describe a robust regression
- Bootstrapping: Not fully representing population
- Assumed normal distribution
- Minisamples of current sample
- We instead estimate it when we cannot trace variance of beta correlation
(!) Describe multicollinearity
General:
- Assumptions more critical
- Test variance/correlation between IV´s/beta-coefficient: Must be low
- Problem, not invalid: Beta just untrustworthy
- Detected in correlation table or VIF
- Above 4: Usual
- Above 10: serious multicollinearity
Elements influencing beta variance:
- Higher sigma^2 = Higher beta variance
- Higher IV variance = Lower beta variance
- Higher correlation of IV´s = Higher beta variance: Multicollinearity
Statistics Variance inflation factor / VIF:
- Output from program
- How much IV variance explained by other IV´s: Inflation
- VIF = 1/ (1-R^2j)
(!) Describe outliers
General:
- Not assumption, but nice to check
- Extreme values
- Pull regression line in wrong direction
- Create large residuals
- Often type or data mistake
- Enlarge error term
- Predictive power increase if eliminated
__________
Assumptions:
Normality of error:
- Within accepted range, histogram, normal P-plot
- Ref. Assumption 6
Plot residuals against predicted DV
- Curves, Funnel shape or heteroscedasticity
- Ref. assumption 5
___________
Different ways of detection:
- Adjust predicted values
- Delete residuals
- Cooks distance
- Mahalonobis distance