Linear regression Flashcards

Question 1

Q

Linear regression

Answer

A

For interval and ratio scales. For ordinal dependent variables.

Notation: Yk = b0 + b1 Xk + ek

Yk: dependent variable
Xk: independent variable
b0: Intercept (value of y, if x=0)
b1: slope (change in y, if x increases by 1)
ek: error term / residual
k: data row k

Question 2

Q

Dependent variable

Answer

A

A variable for which at least some of the variation is theorized to be caused by one or more independent variables.

Also termed response variable (experiments), outcome variable, criterion variable, target variable and output variable.

Question 3

Q

Independent variable

Answer

A

A variable that is theorized to cause variation in the dependent variable.

Also termed predictor variable (regressions), explanatory variable, treatment variable (experiments), manipulated variable (experiments) and input variable.

Question 4

Q

Null - hypothesis

Answer

A

H0 is a theory-based statement about what we would expect to observe there were no relationship between an independent variable and the dependent variable. It assumes that two possibilities are the same, i.e., that observed differences are due to chance alone

Question 5

Q

p-value

Answer

A

Statistical hypothesis test. Measures the probability that we would see the relationship that we are finding because of random chance. Ranges between 0 and 1. The closer to zero, the less likely it is by chance

Question 6

Q

Bivariate relational hypotheses

Answer

A

Relationships between two variables. Directed and undirected relationships. Test for statistical significance via correlation or univariate regression analysis.

Question 7

Q

Multivariate relational hypotheses

Answer

A

Relationships between more than two variables. Directed relationships. Test for statistical significance via multivariate regression analysis

Question 8

Q

Type I error

Answer

A

False positive. Rejecting the null hypothesis although the null hypothesis is true.

Question 9

Q

Type II error

Answer

A

False negative. Accepting the null hypothesis although the null hypothesis is false. Typically happens because the sample is too small

Question 10

Q

Correlation (r)

Answer

A

A statistical measure of covariation which summarizes the direction and strength of the linear relationship between two variables.

A value of 0.64 means that 64% of the variance in one variable can be explained by the variance in the other variable.

0.9 < r < 1.0 Very strong correlation -0.9 > r > -1.0
0.7 < r < 0.9 Strong correlation -0.7 > r > -0.9
0.5 < r < 0.7 Average correlation -0.5 > r > - 0.7
0.2 < r < 0.5 Weak correlation -0.2 > r > -0.5
0.0 < r < 0.2 Very weak correlation -0.0 > r > -0.2

Question 11

Q

Spearman’s correlation

Answer

A

Rank correlation, i.e., a statistical dependence between the rankings of two variables. Dichotomous and ordinal scales.

Question 12

Q

Pearson’s correlation

Answer

A

Linear correlation, i.e., a statistical dependence between two metric variables. Interval and ratio scales.

Question 13

Q

Degrees of freedom

Answer

A

In general terms, degrees of freedom can be thought of as the number of independent pieces of information available for estimating a parameter or for calculating a statistic. It reflects the number of values in a calculation that are free to vary. It helps determine the appropriate distribution to use when making inferences about the population from a sample.

Degrees of freedom (df) play a critical role in various statistical tests, including ANOVA, the F-statistic, the t-statistic, Pearson’s r, and the chi-squared test.

Question 14

Q

Scatterplots

Answer

A

Scatterplots are ideal for visualizing the relationship between two continuous variables. They help to identify whether a relationship exists (e.g., positive, negative, or no correlation). In other words, it helps detecting outliers, identifying correlations and assessing linear vs non-linear relationships.

Question 15

Q

Ordinary-Least-Squares (OLS) regression

Answer

A

The most popular type of linear regression. The OLS estimator minimizes the sum of all squared estimation errors (i.e.,
residuals) in the sample

Question 16

Q

Assumptions for OLS

Answer

A

Linear relationship between dependent- and independent variable
No multicollinearity
Homoskedasticity (not heteroskedasticity) in a scatterplot
Normally distributed error terms

Question 17

Q

U-shaped relationships

Answer

A

You can still use a linear regression, just include a quadratic formula.

Question 18

Q

T-statistic

Answer

A

Tests significance against the hypothesis that the regression coefficient is equal to 0. It indicates how strongly each independent variable is associated with the dependent variable. A higher absolute value of the t-statistic suggests a stronger relationship, while a t-statistic close to zero indicates that the variable may not be a significant predictor.

Question 19

Q

Regression coefficients (β)

Answer

A

Indicate the change in the dependent variable (Y) for a one-unit change in the independent variable (X), holding other variables constant.

Interpretation: A positive coefficient indicates a positive relationship, while a negative coefficient indicates a negative relationship.

If (β) is 2, for every one-unit increase in the independent variable, the dependent variable increases by 2 units.

Question 20

Q

Intercept

Answer

A

The expected value of the dependent variable when all independent variables are zero.

Question 21

Q

R² coefficient of determination

Answer

A

A goodness of fit measure that varies between 0 and 1 representing the proportion of variation in the dependent variable that is accounted for by the model, i.e. how much variance the independent variables are able to explain. Higher values indicate that a larger proportion of variance is explained by the model.

R2=0.70 means that 70% of the variance in the dependent variable is explained by the independent variables in the model.

Interpretation of R2:
Substantial = R2> 0.26
Moderate = 0.13 < R2 < 0.26
Weak = R2 < 0.13

Question 22

Q

Standard Error (SE)

Answer

A

Standard error is a measure of the precision of a sample mean as an estimate of the population mean. It quantifies how much the sample mean is expected to vary from the true population mean due to sampling variability.

Question 23

Q

Standard deviation (SD)

Answer

A

Measures variability within a dataset. Indicates how spread out the data points are around the mean

Question 24

Q

Delta R2

Answer

A

Delta R2 indicates how much additional variance in the dependent variable is explained by the new predictors. A positive
value suggests that the new predictors improve the model’s explanatory power, while a negative or very small value suggests that the new predictors do not significantly improve the model.

Question 25

Q

F-statistic

Answer

A

The F-statistic is a ratio that compares the variance explained by the regression model to the variance that is not explained (the residual variance). It assesses whether at least one of the independent variables in the model significantly predicts the dependent variable.

Test criterion, whether the estimated model is also valid for the population beyond the sample.
Significance can be read from p(F).