Lecture 13 - Regression Flashcards

1
Q

(!) Describe different test types & some assumptions

A

Assumptions:
- Homogeneity of variance between groups
- Independent obs.
- Covariate unrelated to experimental treatment: ANCOVA

____________

Test-type:

Independent t-test:
- 1 binary IV
- Both DV & IV continuous
- Between design

Paired sample t-test:
- 1 binary IV
- Both DV & IV continuous
- Within design

ONE-WAY ANOVA
- 1 categorical IV factors
- Continuous DV

N-WAY ANOVA:
- More than 1 categorical IV factors
- Continuous DV

ANCOVA:
- 1 or more IV
- Give purified mean
- C = Covariance: Relation between variables
- Almost read like ANOVA after adjustment
- Variables should be independent of treatment
- Random selected group assumed equal in all perspectives

F-test:
- Compare variance while ANOVA compare means
- Only tell if mean differ, not how: Thus often mix w. ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(!) Describe multiple regression & what it test

A

General:
- How DV mean change as function of IV´s: Partial effect of each IV
- More variables change calculation: Correlate w. each other
- Every part controlled for & held constant
- Can model non-lineary relationship

Test:
- If relationship between IV & DV is significant: Non-zero
- Magnitude of relationship between IV & DV
- Direction of relationship: Positive or negative

Dont test:
- Causality: Inferred in RD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(?) Describe the elements of multiple regression models

A

Y = DV
Beta_0 = Intercept: DV value when all IV´s are zero
Beta_1*X_1 = beta coefficient = IV. Change in Y with respect to change in X_x
u = Error term: Factors other than X_k influencing Y
i = obs. number
k = number of IV
n = number of obs.
Apostrof = Estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does linear regression model?

A

Conditional mean of DV: Linear combination of IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe centering & standardization

A

Centering:
- Subtract mean of variable
- Meaningful intercept
- Easier interpretation of interactions
- Don’t do for binary variables
- Same interpretation/conclusion of coefficients
- Set to 0

Standardization:
- Subtract mean & divide by std.dev
- Standard variables: Mean: 0, Std. dev.: 1
- Comparable: Despite different scales
- Interpretation/conclusion changes
- SPSS calculate standardized coefficients for you: No reason to change raw scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(!) Describe methods of choosing predictors

A

Hierarchical entry:
- Building up or down: Enter or remove
- Entire block of variables in gradual steps
- First block is variables from previous research: Controls

__________

Forced entry:
- Recommended
- All variables entered simultaneously: Decide which one stays

___________

Stepwise methods: SPSS chooses for you

General:
- Not recommended
- You > SPSS: Dont consider theory, previous research & hypotheses tested
- Controls & basic effects –> Variables in hypotheses –> Interactions & mediations

Forward:
- Start with intercept only
- Choose variable with highest simple correlation with DV
- Choose variable best explaining remaining variance

Stepwise:
- Same as forward, but also remove & add useful/not useful variables

Backward:
- Same as forward but remove variables gradually based on p-values or t-statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(!) Explain goodness of fit

A

General:
- How much actual DV variance model explain
- SST = SSE + SSR

___________

Terms:

Total sum of squares / SST:
- Total variance of DV

Explained sum of squares / SSE
- Total variance explained by model

Sum of squared residuals / SSR
- Unexplained left over variance
- Can be inflated: Dont increase by added IV despite insignificant

R^2
- Show how much variance model explain
- Between 0 & 1
- Higher = Better

Adjusted R^2
- Better than normal R-squared
- Penalize R^2 for each IV added
- Show quality of model
- F-change = Significance of R-square change
- Show how much variance model explain
- Tell whether added variable increase explanatory power of model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(!) Describe control variables & how to handle them

A

General:
- IV´s not of interest
- Explain “left over” variance

Handling:
- Compare to previous research
- Explain reason for inclusion
- Transparency: Show effect with & without
- Quality check: Same as variables of interest
- Consider extreme correlations: Multi correlation
- Avoid using them to show result you want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

(!) Describe statistical test types

A

General:
- Try model reality as close as poss.
- Test how much variance in DV model explain: Goodness of fit
- Significance test: Coefficient significantly different from 0?
- Summary of dataset to one number used in hypothesis test
- DV must be continuous

T-statistics:
- Compare 1 categorial IV to 1 continuous DV
- Compare means of two groups
- How good a predictor x is
- Increase by larger samples
- Variance of coeficient beta estimate
- Coefficient accuracy
- t = Estimated beta / Standard error of estimated beta
- Need reference point for judgement
- Regression

F-test:
- Compare 2 continuous variables: IV & DV
- Test if same variance
- Variance explained by model between group divided by error variance
- ANOVA

Chi-squared test:
- Compare two categorial variables: IV & DV
- Compare expected & observed result
- Test if variables are independent of each other
- Result of fitting function
- Would be 0 if reality
- Summarize process of fitting function
- Incl. degree of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(!) Describe beta, error, slope, fitted value, intercept & critical values

A

Beta:
- Coefficient
- Show degree of change in DV for every 1-unit change in IV
- Apostrof if estimated: Not in general model
- We seek to reduce error in beta estimate

Error:
- Residual
- Space between real obs. & slope
- Always above line
- What is left after fitting a model
- Diff. between obs. & fitted values

Slope:
- Shows condicted values

Fitted value
- Under line
- Predicted value of y

Intercept:
- Value of DV when all IV = 0
- If all IV´s centered

Critical values:
- Can be found in the back of book
- Tell if the value is anormal given distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(!) How should different variables be inferred?

A

General
- Negative beta-coefficient: IV increase = DV decrease
- Positive beta coefficient: IV increase = DV increase
- G: -0,904 = Being female means -0,904 less engaged
- F: More frequent = More engaged
- A: More acknowledge = More acknowledge

Dummyvariable:
- Binary variables: Eg. Male = 0, woman = 1
- Show effect for 1 compared to 0

Categorial:
- 1 as baseline
- Three categories = Read two coefficients
- Compare to base group
- Eg. Single = 1, Married = 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(!) Describe the basic assumptions for t-statistics

A

Assumptions for unbiased OLS estimation:

General:
- OSL = Ordinary least squares
- Technique to find linear regression

  1. Linearity:
    - Scatterplot: Require retangular
    - Relationship between X & mean of Y is linear
    - Y-axis DV. X-axis: IV
    - Variable can be logarithm or squared: Eg. Age
    - Squared: Positive linearity to top point, negative after
    - Eg. 10, 20, 30, 40
  2. Random sample:
    - Sample must reflect population
    - Set at RD
  3. No perfect collinearity:
    - No exact linear combination of IV´s: Multicollinearity
    - No constant IV´s
    - Min. as many obs. as parameters: n<k+1
    - Eg. Allow age & age^2, not age in years & in months
  4. Zero conditional mean of error:
    - Scatterplot: Mirror
    - Each IV error term expected to be 0
    - Large sample can balance it out
    - Violated if omitting important variables
    - Incl. Control variable to hinder systematic variance
    - Assumption important to minimize residuals & calculate betas
    ___________

Statistical significance
- If not fulfilled we cannot be sure of significance

  1. Homoscedasticity:
    - Scatterplot: Require no cone-shape
    - Variance of error term
    - Error term of IV´s must have same variance: sigma^2
    - Violated if distance to line become further
    - DV linear combinations of IV´s
    - Variance should not depend on changing IV-values
    - Error must not autocorrelate
  2. Normality:
    - Sharpio Wilk: Significance required: Above 0,05. R: >0,01
    - Include assumption 4 & 5
    - Hardest to pass
    - Can be loosened
    - Helped by larger samples
    - Helped by normal distribution on DV
    - Unobserved error is normally distributed in population
    - If not fulfilled –> Bootstrapping: Replace w. obs. data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(?) Describe a robust regression

A
  • Bootstrapping: Not fully representing population
  • Assumed normal distribution
  • Minisamples of current sample
  • We instead estimate it when we cannot trace variance of beta correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(!) Describe multicollinearity

A

General:
- Assumptions more critical
- Test variance/correlation between IV´s/beta-coefficient: Must be low
- Problem, not invalid: Beta just untrustworthy
- Detected in correlation table or VIF
- Above 4: Usual
- Above 10: serious multicollinearity

Elements influencing beta variance:
- Higher sigma^2 = Higher beta variance
- Higher IV variance = Lower beta variance
- Higher correlation of IV´s = Higher beta variance: Multicollinearity

Statistics Variance inflation factor / VIF:
- Output from program
- How much IV variance explained by other IV´s: Inflation
- VIF = 1/ (1-R^2j)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(!) Describe outliers

A

General:
- Not assumption, but nice to check
- Extreme values
- Pull regression line in wrong direction
- Create large residuals
- Often type or data mistake
- Enlarge error term
- Predictive power increase if eliminated

__________

Assumptions:

Normality of error:
- Within accepted range, histogram, normal P-plot
- Ref. Assumption 6

Plot residuals against predicted DV
- Curves, Funnel shape or heteroscedasticity
- Ref. assumption 5

___________

Different ways of detection:
- Adjust predicted values
- Delete residuals
- Cooks distance
- Mahalonobis distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is meant by ceteris paribus?

A
  • Other things equal
  • Differentiate specific effect of each IV
  • Coefficient change depend on inclusions in regression
  • Keep everything else in regression constant
  • Control variables matter a lot
  • Poss. to tease out effects of IVs that might be correlated
17
Q

(!) Describe what is meant by a significace, p-value, H_0, confidence intervals, the standardized coefficient & meaning of intercept when variables are centered

A

Significant coefficient:
- Tested by p-value: Probability of being wrong
- No safe conclusion on insignificant coefficient
- Coefficient likely differ from 0, but may be small
- Increase in R-squared
- Significance levels: 0,001, 0,01, 0,05
- Accepted type 1 errors

H_0:
- Relationship between IV & DV = 0: IV dont influence DV
- Rejected when large enough test statstics
- Type 1 error: If true one rejected

Standard error:
- Deviation of coefficients accuracy divided by sample size

Confidence interval:
- +/- 1,96 x standard error: If 95%
- If re-run, we would find the value in the interval 95% of the times

Standardized coefficient:
- IV-effect in standard deviation
- Allow IV comparison

Intercept:
- When all IV centered
- DV score when all IV´s are zero: Not valuable data
- DV score of average human in sample
- DV score when all IV´s at mean value

18
Q

(!) Describe the difference between between & within subject design

A

Between:
- Statistical diff. between groups

Within:
- Statistical diff. between treatment in same group
- Randomized order prevent learning curve effects

Full factorial design:
- Matrix
- Both between & within
- Data collected from each IV-combination
- Answer more interesting hypothesis

19
Q

Describe the difference between crossed & nested models

A

Crossed model:
- Poss. To calculate interaction

Nested model:
- If parameters in Model A are subset of parameters in Model B

20
Q

Describe logistic regression

A
  • DV categorial: Binary
  • Z-statistics
  • Odds ratio for interpretation