Lecture 6, 7 and 8 Flashcards
- What does correlation coefficient ’r’ explain? Positive and negative correlation?
When r is either close to 1 or -1 there is a strong correlation between to variables.
- Why do we use OLS linear regression models?
To model the linear relationship between one or more independent variables.
- What is the difference between a correlation matrix and a linear regression model?
A correlation matrix quantifies associations between pairs of variables but does not provide a predictive model, while a linear regression model aims to model and quantify the relationship between independent and dependent variables to predict outcomes and understand variable associations more deeply.
- In the linear regression equation 𝑌 = 𝛼 + 𝛽 𝑋 + 𝜖, what are variables X and Y?
Y is the outcome and X is what you equation grows with.
In the linear regression equation 𝑌 = 𝛼 + 𝛽 𝑋 + 𝜖, what are parameters alpha and beta?
- Alpha is you value when x is 0
- Beta is what your linear equation grows with every time x expanse.
. In the linear regression model, why do we require variation in values of X? What happens
if there is no variation in values of X?
if there is no variation in values of x the strength or direction in the linear regression model and there will not be a slope coefficient.
- What are dummy or indicator variables? Provide examples.
- represent categorical data or to convert categorical data into a format that can be used in quantitative models.
- use example in your own project
How do we include nominal independent variables in regression analysis?
We have done that in our project reflect to that.
. What is an outlier and why can they be a problem in your analysis? Provide 1 or 2 ways
in dealing with outliers.
An outlier is a observation or data point, which is significantly different from the rest of the dataset.
- Identify and examine: make a visualizations to identified the outliers.
- You can replace the outlier but I will affect your result.
In our linear regression model output, an R-squared 𝑅2 is reported, what does it mean and
what do we use it for?
R^2 explain how much variability in the dependent variable. If it close to 0 there is non variability in the dependent variable but if is 1 there is much variability in the dependent variable.
Can we assume that the regression model with the highest number of predictors is the best
model? Why or why not.
More predictors just cause noise in the dataset, where R2 is a good indicator to say something about a regression model.
The slope parameter 𝛽
̂
of a simple linear regression is +2. Interpret this number.
If beta have a +2 the regression line is positive
. The slope parameter 𝛽
̂
of a simple linear regression is -2. Interpret this number.
But if the linear regression model have -2 beta is have a negative line.
An analysis relates the age of used cars (in years) to their price (in USD), using data on a
specific type of car. In a linear regression of the price on age, the slope parameter 𝛽
̂
is -
700. Interpret the coefficient.
it means that every year the car is losing 700 in value.
An analysis relates the size of apartments (in square meters) to their price (in USD), using
data from one city. In a linear regression of the price on size, the slope parameter is +600.
Interpret the coefficient.
It means every time the apartments get a square meter bigger it gross 600 in price.