L17 - Regression 1 Flashcards
what is a model?
model is a mathematical, physical or otherwise logical representation of a system that is used to convey real world events or phenomenon.
what is the main difference between correlation and regression?
correlation just looks at how two variables are related, Regression looks at dependent and independent variables (X,Y) and explains the relationship between them.
regression attempts to make a causal inferance
how do you form a Hypothesise a regression model between two variables?
determine which variable may be dependent on another (which one is X and Y).
then hypothesize the nature of the relationship, positive or negative?
what is the equation for the linear regression model?
Y = B0 + B1X + E
where:
B0 = y intercept
B1 = lope of the line
X = indipendente variable
E = Error
what are the assumptions for a simple linear regression (4)
- Data is interval or ratio
- the relationship between x and Y is linear
- we can identify one variable
- the x variables are measured without error
what does a residual/error plot do?
it plots variables residual scores on a graph to see if a trend is visible. if a trend is visible then the data variables understudy may be linked through another variable not considered.
in order for the regression model to be accepted there are four requirements that mus be satisfied
- the errors must be normally distributed
- the errors have a mean of zero
- the standard deviation of error is constant for all values of X (homoscedasity)
- The set of errors associated with different values of Y are all independent.
what is error in a regression model?
error is the difference between the observed and predicted value of Y. for each value X
draw out the sum of squares diagram including Total, Unexplained (error) and total
refer to slides
why do we find the line of best fit? what does it tell us?
the line of best fit is a form of model that is used to predict values of Y for given X that may be outside the dataset.
what is the Coefficient of determination?
R^2 tells us how much of the variation in Y can be explained by X.
Explained variation R^2 =\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ Total variation
what does R^2 = 0.67 mean?
it means that 67% of the variation in Y can be explained by X
what is the error term within a residual plot
it is the amount of error within a regression model. depending on the distribution of the error there will be different
what is the error term within a residual plot
it is the amount of error within a regression model. depending on the distribution of the error there will be different