Lecture 13 - Regression Flashcards

Question 1

Q

(!) Describe different test types & some assumptions

Answer

A

Assumptions:
- Homogeneity of variance between groups
- Independent obs.
- Covariate unrelated to experimental treatment: ANCOVA

____________

Test-type:

Independent t-test:
- 1 binary IV
- Both DV & IV continuous
- Between design

Paired sample t-test:
- 1 binary IV
- Both DV & IV continuous
- Within design

ONE-WAY ANOVA
- 1 categorical IV factors
- Continuous DV

N-WAY ANOVA:
- More than 1 categorical IV factors
- Continuous DV

ANCOVA:
- 1 or more IV
- Give purified mean
- C = Covariance: Relation between variables
- Almost read like ANOVA after adjustment
- Variables should be independent of treatment
- Random selected group assumed equal in all perspectives

F-test:
- Compare variance while ANOVA compare means
- Only tell if mean differ, not how: Thus often mix w. ANOVA

Question 2

Q

(!) Describe multiple regression & what it test

Answer

A

General:
- How DV mean change as function of IV´s: Partial effect of each IV
- More variables change calculation: Correlate w. each other
- Every part controlled for & held constant
- Can model non-lineary relationship

Test:
- If relationship between IV & DV is significant: Non-zero
- Magnitude of relationship between IV & DV
- Direction of relationship: Positive or negative

Dont test:
- Causality: Inferred in RD

Question 3

Q

(?) Describe the elements of multiple regression models

Answer

A

Y = DV
Beta_0 = Intercept: DV value when all IV´s are zero
Beta_1*X_1 = beta coefficient = IV. Change in Y with respect to change in X_x
u = Error term: Factors other than X_k influencing Y
i = obs. number
k = number of IV
n = number of obs.
Apostrof = Estimated

Question 4

Q

What does linear regression model?

Answer

A

Conditional mean of DV: Linear combination of IV

Question 5

Q

Describe centering & standardization

Answer

A

Centering:
- Subtract mean of variable
- Meaningful intercept
- Easier interpretation of interactions
- Don’t do for binary variables
- Same interpretation/conclusion of coefficients
- Set to 0

Standardization:
- Subtract mean & divide by std.dev
- Standard variables: Mean: 0, Std. dev.: 1
- Comparable: Despite different scales
- Interpretation/conclusion changes
- SPSS calculate standardized coefficients for you: No reason to change raw scores

Question 6

Q

(!) Describe methods of choosing predictors

Answer

A

Hierarchical entry:
- Building up or down: Enter or remove
- Entire block of variables in gradual steps
- First block is variables from previous research: Controls

__________

Forced entry:
- Recommended
- All variables entered simultaneously: Decide which one stays

___________

Stepwise methods: SPSS chooses for you

General:
- Not recommended
- You > SPSS: Dont consider theory, previous research & hypotheses tested
- Controls & basic effects –> Variables in hypotheses –> Interactions & mediations

Forward:
- Start with intercept only
- Choose variable with highest simple correlation with DV
- Choose variable best explaining remaining variance

Stepwise:
- Same as forward, but also remove & add useful/not useful variables

Backward:
- Same as forward but remove variables gradually based on p-values or t-statistics

Question 7

Q

(!) Explain goodness of fit

Answer

A

General:
- How much actual DV variance model explain
- SST = SSE + SSR

___________

Terms:

Total sum of squares / SST:
- Total variance of DV

Explained sum of squares / SSE
- Total variance explained by model

Sum of squared residuals / SSR
- Unexplained left over variance
- Can be inflated: Dont increase by added IV despite insignificant

R^2
- Show how much variance model explain
- Between 0 & 1
- Higher = Better

Adjusted R^2
- Better than normal R-squared
- Penalize R^2 for each IV added
- Show quality of model
- F-change = Significance of R-square change
- Show how much variance model explain
- Tell whether added variable increase explanatory power of model

Question 8

Q

(!) Describe control variables & how to handle them

Answer

A

General:
- IV´s not of interest
- Explain “left over” variance

Handling:
- Compare to previous research
- Explain reason for inclusion
- Transparency: Show effect with & without
- Quality check: Same as variables of interest
- Consider extreme correlations: Multi correlation
- Avoid using them to show result you want

Question 9

Q

(!) Describe statistical test types

Answer

A

General:
- Try model reality as close as poss.
- Test how much variance in DV model explain: Goodness of fit
- Significance test: Coefficient significantly different from 0?
- Summary of dataset to one number used in hypothesis test
- DV must be continuous

T-statistics:
- Compare 1 categorial IV to 1 continuous DV
- Compare means of two groups
- How good a predictor x is
- Increase by larger samples
- Variance of coeficient beta estimate
- Coefficient accuracy
- t = Estimated beta / Standard error of estimated beta
- Need reference point for judgement
- Regression

F-test:
- Compare 2 continuous variables: IV & DV
- Test if same variance
- Variance explained by model between group divided by error variance
- ANOVA

Chi-squared test:
- Compare two categorial variables: IV & DV
- Compare expected & observed result
- Test if variables are independent of each other
- Result of fitting function
- Would be 0 if reality
- Summarize process of fitting function
- Incl. degree of freedom

Question 10

Q

(!) Describe beta, error, slope, fitted value, intercept & critical values

Answer

A

Beta:
- Coefficient
- Show degree of change in DV for every 1-unit change in IV
- Apostrof if estimated: Not in general model
- We seek to reduce error in beta estimate

Error:
- Residual
- Space between real obs. & slope
- Always above line
- What is left after fitting a model
- Diff. between obs. & fitted values

Slope:
- Shows condicted values

Fitted value
- Under line
- Predicted value of y

Intercept:
- Value of DV when all IV = 0
- If all IV´s centered

Critical values:
- Can be found in the back of book
- Tell if the value is anormal given distribution

Question 11

Q

(!) How should different variables be inferred?

Answer

A

General
- Negative beta-coefficient: IV increase = DV decrease
- Positive beta coefficient: IV increase = DV increase
- G: -0,904 = Being female means -0,904 less engaged
- F: More frequent = More engaged
- A: More acknowledge = More acknowledge

Dummyvariable:
- Binary variables: Eg. Male = 0, woman = 1
- Show effect for 1 compared to 0

Categorial:
- 1 as baseline
- Three categories = Read two coefficients
- Compare to base group
- Eg. Single = 1, Married = 2

Question 12

Q

(!) Describe the basic assumptions for t-statistics

Answer

A

Assumptions for unbiased OLS estimation:

General:
- OSL = Ordinary least squares
- Technique to find linear regression

Linearity:
- Scatterplot: Require retangular
- Relationship between X & mean of Y is linear
- Y-axis DV. X-axis: IV
- Variable can be logarithm or squared: Eg. Age
- Squared: Positive linearity to top point, negative after
- Eg. 10, 20, 30, 40
Random sample:
- Sample must reflect population
- Set at RD
No perfect collinearity:
- No exact linear combination of IV´s: Multicollinearity
- No constant IV´s
- Min. as many obs. as parameters: n<k+1
- Eg. Allow age & age^2, not age in years & in months
Zero conditional mean of error:
- Scatterplot: Mirror
- Each IV error term expected to be 0
- Large sample can balance it out
- Violated if omitting important variables
- Incl. Control variable to hinder systematic variance
- Assumption important to minimize residuals & calculate betas
___________

Statistical significance
- If not fulfilled we cannot be sure of significance

Homoscedasticity:
- Scatterplot: Require no cone-shape
- Variance of error term
- Error term of IV´s must have same variance: sigma^2
- Violated if distance to line become further
- DV linear combinations of IV´s
- Variance should not depend on changing IV-values
- Error must not autocorrelate
Normality:
- Sharpio Wilk: Significance required: Above 0,05. R: >0,01
- Include assumption 4 & 5
- Hardest to pass
- Can be loosened
- Helped by larger samples
- Helped by normal distribution on DV
- Unobserved error is normally distributed in population
- If not fulfilled –> Bootstrapping: Replace w. obs. data

Question 13

Q

(?) Describe a robust regression

Answer

A

Bootstrapping: Not fully representing population
Assumed normal distribution
Minisamples of current sample
We instead estimate it when we cannot trace variance of beta correlation

Question 14

Q

(!) Describe multicollinearity

Answer

A

General:
- Assumptions more critical
- Test variance/correlation between IV´s/beta-coefficient: Must be low
- Problem, not invalid: Beta just untrustworthy
- Detected in correlation table or VIF
- Above 4: Usual
- Above 10: serious multicollinearity

Elements influencing beta variance:
- Higher sigma^2 = Higher beta variance
- Higher IV variance = Lower beta variance
- Higher correlation of IV´s = Higher beta variance: Multicollinearity

Statistics Variance inflation factor / VIF:
- Output from program
- How much IV variance explained by other IV´s: Inflation
- VIF = 1/ (1-R^2j)

Question 15

Q

(!) Describe outliers

Answer

A

General:
- Not assumption, but nice to check
- Extreme values
- Pull regression line in wrong direction
- Create large residuals
- Often type or data mistake
- Enlarge error term
- Predictive power increase if eliminated

__________

Assumptions:

Normality of error:
- Within accepted range, histogram, normal P-plot
- Ref. Assumption 6

Plot residuals against predicted DV
- Curves, Funnel shape or heteroscedasticity
- Ref. assumption 5

___________

Different ways of detection:
- Adjust predicted values
- Delete residuals
- Cooks distance
- Mahalonobis distance

Question 16

Q

What is meant by ceteris paribus?

Answer

A

Other things equal
Differentiate specific effect of each IV
Coefficient change depend on inclusions in regression
Keep everything else in regression constant
Control variables matter a lot
Poss. to tease out effects of IVs that might be correlated

Question 17

Q

(!) Describe what is meant by a significace, p-value, H_0, confidence intervals, the standardized coefficient & meaning of intercept when variables are centered

Answer

A

Significant coefficient:
- Tested by p-value: Probability of being wrong
- No safe conclusion on insignificant coefficient
- Coefficient likely differ from 0, but may be small
- Increase in R-squared
- Significance levels: 0,001, 0,01, 0,05
- Accepted type 1 errors

H_0:
- Relationship between IV & DV = 0: IV dont influence DV
- Rejected when large enough test statstics
- Type 1 error: If true one rejected

Standard error:
- Deviation of coefficients accuracy divided by sample size

Confidence interval:
- +/- 1,96 x standard error: If 95%
- If re-run, we would find the value in the interval 95% of the times

Standardized coefficient:
- IV-effect in standard deviation
- Allow IV comparison

Intercept:
- When all IV centered
- DV score when all IV´s are zero: Not valuable data
- DV score of average human in sample
- DV score when all IV´s at mean value

Question 18

Q

(!) Describe the difference between between & within subject design

Answer

A

Between:
- Statistical diff. between groups

Within:
- Statistical diff. between treatment in same group
- Randomized order prevent learning curve effects

Full factorial design:
- Matrix
- Both between & within
- Data collected from each IV-combination
- Answer more interesting hypothesis

Question 19

Q

Describe the difference between crossed & nested models

Answer

A

Crossed model:
- Poss. To calculate interaction

Nested model:
- If parameters in Model A are subset of parameters in Model B

Question 20

Q

Describe logistic regression

Answer

A

DV categorial: Binary
Z-statistics
Odds ratio for interpretation