Lecture 3 Flashcards
Residual standdard Error
Average amount that the response deviate from the regression line
y~ is the estimate
n is the sample size
RSE small implies model fits data well
True
RSE high implies model does not fit Data well
True
Any prediction of lpsa based on lweight will still be off by 1.046 units on average.
If it is accepted or not it depends on the problem
True
RSE is measured in units of the output
TRUE
R-squared is a measure of the fit however without the units
YES
RSS: Amount if variablity that is left unexplained after performing the regression
True
TSS: total variance in response to Y
Amount of variability in response before regression is performed
R squared measure the proportion of variability in response y that can be performed using x
True
R squared close to 1 : large proportion of variability is explained by x which is good
True
R squared close to 0 => Regression did not explain much of the variability
True
Linear regression thus can be wrong
When the application we are considering to approximate is far from being approximated using he model then R2 will be near zero
True
R2 is highly affected by the number iof predictors we have
True
Since R2 is highly affected by the number of predicotrs we have what is called adjusted R2,(how we pick predicotrs)
Regards
We want F-statistics to be as far from 1 as it can be
True
The larger the F-statics , the more it indicates that we have a relation between what we are feeding into the model and the response
True
Correlation : measure of linear relationship between X and Y
True
Correlation does not imply casuality, it meeans how much value vary in the same way
True
Multiple linear regression model , we want to add more predictors to our response variable
Trues
Interaction effect the or as known as the synergy effect in marketing
Accounting for possible interactions between the predictors
I introduce a new coefficient and a new variable given by X1 * X2 which allws me to account for interaction
True
Linear regression: I am assuming the relationship between response and the predictor is linear
True
Since the relationship between the predictor and the response is not always linear thus we can generate a polynomial regression model
True
We should always ask ourselves, is it worth it to create a higher order model?
True
I data science we are taking a sample from the population in order to get something that we can say about the population
TRUE
W want to keeep part of our data aside in order to test our model
TRUE
WE want to see how our model is performing on different subsets of data
TRUE
We want to estimate the test prediction error of our model
TRUE
Resampling: Given one sample we repeatadily draw samples from it in order to refit our model
TRUE
Cross validation is when we want to evaluate the performance of our model by estimating it is test error
TRUE
When you have a flexible model the training error might underestimate the test error
True
We divide our data into part for creating model and part for testing
TRUE
WE need to randomly split our data
TRUE