Week 6 Assumptions Part III and Multicollinearity Flashcards

1
Q

Homoskedasticity

A

An assumption of equal or similar variances in different groups being compared

This is an important assumption of parametric statistical tests because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and skewed test results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Heteroskedasticity

definition and consequence

A

Happens when the standard deviations of a predicted variable, monitored over different values of an independent variable or as related to prior time periods, are non-constant.

consequence:
- Inefficiency
- SE is wrong (biased), resulting in unreliable confidence intervals and hypotheses tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Causes of heteroskedasticity

A
  • Necessary but not sufficient conditions
  • Measurement improves with independent variables
  • Presence of outliers (An outlier is an observation that lies an abnormal distance from other values in a random sample from a population)
  • Specification error (especially omission of relevant variable and nonadditivity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Omission

A

Samples with invalid data are discarded from further analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Omitted-variable bias (OVB)

A

Occurs when a statistical model leaves out one or more relevant variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Weighted Least Squares (WLS)

A

Weighted least squares is a version of least squares where you give importance (or weight) to some data point over others. In WLS, points with higher weights are considered more reliable, so they have more influence on finding the best-fitting line or model.

In presence of heteroskedasticity, WLS is BLUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Autocorrelation

A

Measures the relationship between a variable’s current value and it’s past values.

An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of -1 represents a perfect negative correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional

A

Observations are recorded for different individuals/objects at one time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Time series

A

Observations are recorded for the same individual/object across time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time series, cross sectional (or repeated cross-sections)

A

Observations are recorded across time but for different individuals each time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Panel

A

Observations are recorded for the same individuals across time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Serial autocorrelation

A

The correlation between neighbouring observations ordered in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Spatial autocorrelation

A

The correlation between neighbouring observations ordered in space

Example: crime rates in neighboring city wards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Multicollinearity

A

Occurs when two or more predicotr variables in a regression model are highly correlated, making it hard to determine their individual effects on the outcome. This can lead to unstable estimates and reduce the reliability of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Perfect multicollinearity

A

When one independent variable is an exact linear combination of one or more other independent variables. This makes it impossible for the model to estimate unique coefficients for those variables, as they provide the same information

  • Results in R2of 1, but indeterminate model (cannot be estimated!)
  • Does not violate Gauss-Markov, estimates are still BLUE, but variances are high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dummy variable

A

A dummy variable is a numerical variable used in regression models to represent categorical data. It takes values of 0 or 1 to indicate the absence or presence of a particular category