Linear Regression Flashcards
What is one of the most common methods of prediction?
Regression Analysis
it is used whenever we have a casual relationship between variables
What is a Linear Regression?
a linear regression is a linear approximation of a causal relationship between two or more variables
How is the Dependent Variable labeled? (the predicted variable)
as Y
How are Independent Variables labeled? (the predictors)
x1, x2, etc
In Y hat - what does the hat denote?
An estimated or predicted value
What is the simple linear regression formula?
Y hat = b0 + b1 * x1
You have an ice-cream shop. You noticed a relationship between the number of cones you order and the number of ice-creams you sell. Is this a suitable situation for regression analysis?
Yes
No
No
You are trying to predict the amount of beer consumed in the US, depending on the state. Is this regression material?
Yes
No
Yes
What does correlation measure?
The degree of relationship of two variables
it doesn’t capture causality but shows that two variables move together (no matter in which direction)
What is the purpose of regression analysis?
To see how one variable affects another, or what changes it causes the other
it shows no degree of connection but cause and effect
Which statement is false?
Correlation does not imply causation.
Correlation is symmetrical regarding both variables.
Correlation could be represented as a line.
Correlation does not capture the direction of the causal relationship.
Correlation could be represented as a line.
What does it mean if x and y have a positive correlation?
An increase in x translates to a decrease in y.
An increase in y translates to a decrease in x.
The variables x and y tend to move in the same direction.
None of the above
The variables x and y tend to move in the same direction.
Assume you have the following sample regression: y = 6 + x. If we draw the regression line, what would be its slope?
1
6
x
None of the above
1
What does a p-value of 0.503 suggest about the intercept coefficient?
It is significantly different from 0.
It is not significantly different from 0.
It is equal to 0.503.
None of the above.
It is not significantly different from 0.
What does a p-value of 0.000 suggest about the coefficient (x)?
It is significantly different from 0.
It is not significantly different from 0.
It does not tell us anything.
None of the above.
It is significantly different from 0.
What is the predicted GPA of students with an SAT score of 1850? (Unlike in the lectures, this time assume that any coefficient with a p-value greater than 0.05 is not significantly different from 0)
3.42
3.06
3.23
3.145
3.145
Using the value of the coefficients in front of const and SAT, let’s write down the corresponding formula for linear regression, namely:
GPA = 0.2750 + 0.0017SAT
We can see that the variable const has a p-value of 0.503 which makes it statistically insignificant. The question asks to make a prediction excluding such insignificant variables and that reduces the equation above down to
GPA = 0.0017SAT
Now, plugging in SAT = 1850, we obtain the desired result.
Hope this helps!
Kind regards,
365 Hristina
What is the Sum of Squares Total?
denoted: SST, or TSS
squared difference between the independent variable and its mean
measures the total variability of the dataset
What is the Sum of Squares Regression?
SSR or ESS
sum of the differences between predicted value and the mean of the dependent variable
a measure that describes of how well your line fits the data
if equal to SST then the model captures all the variability and is perfect
What is the Sum of Squares Error?
SSE or RSS
the difference between the observed value and the predicted value
the smaller the error the better the estimation power of the regression
What is the connection between SST, SSR, and SSE?
SST = SSR + SSE
the total variability of the dataset = the explained variability by the regression line + the unexplained variability
a lower error will cause a more powerful regression
Which of the following is true?
SST = SSR + SSE
SSR = SST + SEE
SSE = SST + SSR
SST = SSR + SSE
What is the OLS?
Ordinary Least Squares
The most common method to estimate the linear regression equation
What software do beginner statisticians prefer?
Excel, SPSS, SAS, STATA
What software do data scientist prefer?
Programming languages like, R and Python
the offer limitless capabilities and unmatched speed
What are other methods for determining the regression line?
- Generalized Least Squares
- Maximum likelihood estimation
- Bayesian Regression
- Kernel Regression
- Gaussian Process Regression
Since OLS (Ordinary Least Squares) is simple enough to understand, why do advanced statisticians prefer using programming languages to solve regressions?
Limitless capabilities and unmatched speed.
Other software cannot compute so many calculations.
Huge datasets cannot be used in Excel
None of the above.
Limitless capabilities and unmatched speed.