4: Simple linear regression Flashcards

Question 1

Q

Simple linear regression

Answer

A

A model for a continuous response variable and a continuous explanatory variable, between which a linear relationship is assumed. The data-generating process is assumed to follow a normal distribution.

Question 2

Q

Confidence Interval (CI)

Answer

A

The range that may contain the true mean; with a specified level of certainty (commonly 95% or 99%)

uses a range, the more often you repeat an experiment, every time the CI is different depending on your data, you may or may not have the true mean, if you can get a 95% confidence interval will contain the true mean

Question 3

Q

Difference Total Sum of Squares (TSS) and Residual Sum of Squares (RSS)

Answer

A

TSS gives you an idea of the total variation in your data, while RSS tells you how much variation remains after accounting for the model’s predictions

Question 4

Q

homoscedasticity-heteroscedasticity

Answer

A

Non-constant variance
Equal spread then homo if shows cone shape then hetero. Many statistical tests, including linear regression, assume homoscedasticity. If this assumption is violated (which leads to a condition called heteroscedasticity), it can affect the validity of the test results

Breusch-Pagan test or the White test;
If heteroscedasticity is detected, transformations (like taking the logarithm) of the dependent variable can sometimes help stabilize the variance;
indicates that the model might not be a good fit for the data, and addressing it is crucial for obtaining reliable statistical results

Question 5

Q

Response Variable

Answer

A

also known as the dependent variable, is the outcome or the variable that you are trying to predict.
And can be changed with the explanitory variables
## Footnote

Example: In a study examining how study hours affect test scores, the test score is the response variable because it is what you want to measure or predict based on study hours.

Question 6

Q

Explanatory Variable

Answer

A

also known as the independent variable or predictor variable, is the variable that you manipulate or observe to see how it affects the response variable. It is used to explain changes in the response variable.

the number of study hours is the explanatory variable, as it is the factor you think will influence the test scores.

Question 7

Q

Linear model

Answer

A

A linear model assumes a straight-line relationship between the response variable and the explanatory variable.

linear function y=B0 + B1*x

This means that as the explanatory variable changes, the response variable changes in a predictable linear manner.

Question 8

Q

Model Diagnostics for linear models

Answer

A

Residuals vs Fitted plot: linearity
QQ-plot: asesses normality of your residuals can also asses using shapiro wilk, significant p means non normality or even Omnibus test also includes skewness and kurtosis checks
Scale-Location plot: the Scale-Location plot helps you check if the residuals have constant variance across the fitted values (homoscedasticity) or use BP test signific means heteroscedacity/non constance veriance
Cook’s distance: checks for outliers

non linear -> try transformation; a parabolic shape means quadratic data
If the points fall along a straight line (typically the diagonal line), it suggests that the data is normally distributed. If skewed/s-shape can mean Poisson or Binomial data, non normailty when falling outside of the confidence interval -> GLM
difference is ≤0.5, A good plot will have points scattered randomly, while a funnel-shaped pattern suggests a problem with heteroscedasticity. With non constance veriance often also non linearity -> try transformation
0.5 is anomalouse and 1 is outlier; high leverage means pulling hard on the estimates

Question 9

Q

Box-Cox

Answer

A

To check what tranformation fits best, if it falls between the CI then viable transformation

Question 10

Q

Intercept and slope in regression tabel

Answer

A

where the line crosses the y-axis and the slope of X1 shows 1 step on x-axis is (for example) 9,87 higher then intercept (2) so 11,87 y-axis and 1 x-axis can draw slope

Question 11

Q

Ordinary Least Squares (OLS)

Answer

A

Method to minimize the total amount of noise (error) by minimizing the residual sum of squares (RSS)

Question 12

Q

Difference correlation and causation

Answer

A

Correlation shows a relationship, while causation explains why one variable affects another

4: Simple linear regression Flashcards

(12 cards)