4: Simple linear regression Flashcards

1
Q

Simple linear regression

A

A model for a continuous response variable and a continuous explanatory variable, between which a linear relationship is assumed. The data-generating process is assumed to follow a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Confidence Interval (CI)

A

The range that may contain the true mean; with a specified level of certainty (commonly 95% or 99%)

uses a range, the more often you repeat an experiment, every time the CI is different depending on your data, you may or may not have the true mean, if you can get a 95% confidence interval will contain the true mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Difference Total Sum of Squares (TSS) and Residual Sum of Squares (RSS)

A

TSS gives you an idea of the total variation in your data, while RSS tells you how much variation remains after accounting for the model’s predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

homoscedasticity-heteroscedasticity

A

Non-constant variance
Equal spread then homo if shows cone shape then hetero. Many statistical tests, including linear regression, assume homoscedasticity. If this assumption is violated (which leads to a condition called heteroscedasticity), it can affect the validity of the test results

Breusch-Pagan test or the White test;
If heteroscedasticity is detected, transformations (like taking the logarithm) of the dependent variable can sometimes help stabilize the variance;
indicates that the model might not be a good fit for the data, and addressing it is crucial for obtaining reliable statistical results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Response Variable

A

also known as the dependent variable, is the outcome or the variable that you are trying to predict.
And can be changed with the explanitory variables
## Footnote

Example: In a study examining how study hours affect test scores, the test score is the response variable because it is what you want to measure or predict based on study hours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explanatory Variable

A

also known as the independent variable or predictor variable, is the variable that you manipulate or observe to see how it affects the response variable. It is used to explain changes in the response variable.

the number of study hours is the explanatory variable, as it is the factor you think will influence the test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Linear model

A

A linear model assumes a straight-line relationship between the response variable and the explanatory variable.

linear function y=B0 + B1*x

This means that as the explanatory variable changes, the response variable changes in a predictable linear manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model Diagnostics for linear models

A
  1. Residuals vs Fitted plot: linearity
  2. QQ-plot: asesses normality of your residuals can also asses using shapiro wilk, significant p means non normality or even Omnibus test also includes skewness and kurtosis checks
  3. Scale-Location plot: the Scale-Location plot helps you check if the residuals have constant variance across the fitted values (homoscedasticity) or use BP test signific means heteroscedacity/non constance veriance
  4. Cook’s distance: checks for outliers

  1. non linear -> try transformation; a parabolic shape means quadratic data
  2. If the points fall along a straight line (typically the diagonal line), it suggests that the data is normally distributed. If skewed/s-shape can mean Poisson or Binomial data, non normailty when falling outside of the confidence interval -> GLM
  3. difference is ≤0.5, A good plot will have points scattered randomly, while a funnel-shaped pattern suggests a problem with heteroscedasticity. With non constance veriance often also non linearity -> try transformation
  4. 0.5 is anomalouse and 1 is outlier; high leverage means pulling hard on the estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Box-Cox

A

To check what tranformation fits best, if it falls between the CI then viable transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Intercept and slope in regression tabel

A

where the line crosses the y-axis and the slope of X1 shows 1 step on x-axis is (for example) 9,87 higher then intercept (2) so 11,87 y-axis and 1 x-axis can draw slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ordinary Least Squares (OLS)

A

Method to minimize the total amount of noise (error) by minimizing the residual sum of squares (RSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Difference correlation and causation

A

Correlation shows a relationship, while causation explains why one variable affects another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly