Regression Analysis(Udemy Statistics for Data Science and Business Analysis Flashcards
What is a linear regression?
A linear regression is a linear approximation of a causal relationship between two or more variables
What is the basic 3 part process of linear regression?
1.Get sample data
2.Design a model that works for that sample
3.Make predictions for the whole population
What is the difference between regression and correlation?
Correlation is about the relationship between two variables, regression is about how one variable affects another variable, correlation doesn’t capture causality, and regression is based on cause and effect
What is the sum of squares total?
SST, the squared differences between observed dependent variable and mean, measure of total variability of dataset
Sum of Squares Regression
Sum of differences between predicted value and mean of dependent variable, the measure that defines how well your model fits data
Sum of squares error
Difference between observed and predicted value, smaller the error better estimation power of regression, also known as RSS, residual sum of squares
What is R2?
What does R2 of 0 mean?
That your regression lines explains none of the variability of the data
What does R2 of 1 mean?
That your regression lines explain all of the variability of the data
What is a good R2?
Physics/chemistry = between 0.7-0.99, but in social sciences 0.2 could be fantastic, depends on complexity of topic how many variables are believed to be in play
What is the OLS?
The ordinary squares line, it is the line through the data with the least error
What’s generally better, multiple or simple regression? And why?
Multiple regression is generally better than simple ones, with each additional variable you have the explanatory power may only increase or stay the same
What does the F Test Do?
The F test tests the overall significance of the model, the lower the F statistic, the closer to a non significant model