Regression through MT 1 Flashcards
For assessing the normality assumption of the ANOVA model, we can only use the quantile-quantile normal plot of the residuals.
False
The constant variance assumption is diagnosed using the histogram
false
The estimator sigma^2 is a random variable
true
The regression coefficients are used to measure the linear dependence between two variables
False
The mean sum of square errors in ANOVA measures variability within groups
True
Beta-hat-1 is an unbiased estimator for Beta-0
False
Under the normality assumption, the estimator Beta-1 is a linear combination of normally distributed random variables
True
In simple linear regression models, we lose three degrees of freedom because of the estimation of the three model parameters B-0, B-1, Sigma^2
False
The assumptions to diagnose with a linear regression model are independence, linearity, constant variance, and normality
True
The sampling distribution for the variance estimator in ANOVA is chi-square regardless of the assumptions of the data
False
If the constant variance assumption in ANOVA does not hold, the inference on the equality of the means will not be reliable
True
The negative value of B-1 is consistent with an inverse relationship between x and y
True
If one confidence interval in the pairwise comparison does not include zero, we conclude that the two means are plausible equal
False. if it DOES include zero, we conclude the two means are plausibly equal
The mean sum of square errors in ANOVA measures variability between groups
False (to be confirmed by EH, it measures the variability within groups)
The linear regression model with a qualitative predicting variable with k levels/classes will have k+1 parameters to estimate
True
We assess the assumption of constant-variance by plotting the response variable against fitted values
True
The number of degrees of freedom of the chi-square distribution for the variance estimator is N-K+1 where k is the number of samples
False (it’s n-k-1)
The prediction interval will never be smaller than the confidence interval for data points with identical predictor values
True (add ‘because….’)
If one confidence interval in the pairwise comparison includes only positive values, we conclude that the difference in means is statistically significantly positive
True
Conducting t-tests on each beta parameter in a multiple regression model is the best way for testing the overall significance of the model
False
In the case of a multiple linear regression model containing 6 quantitative predicting variables and an intercept, the number of parameters to estimate is 7
False (parameters are coefficients + variance + intercept which would be 8)
The regression coefficient corresponding to one predictor in multiple linear regression is interpreted in terms of the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable holding all other predictors fixed
True
The proportion of variability in the response variable that is explained by the predicting variables is called correlation
False (R^2)
Predicting values of the response variable for values of the predictors that lie within the data range is known as extrapolation
False. It is extrapolation if the predictor values are outside of the known data range
In multiple linear regression, we study the relationship between a single response variable and several predicting quantitative and/or qualitative variables
True
The sampling distribution used for estimating confidence intervals for the regression coefficients is the normal distribution
False (confidence interval uses a t-dist)
A partial f-test can be used to test whether a subset of regression coefficients are all equal to zero
True
Prediction is the only objective of multiple linear regression
False, estimation is also a goal
The equation to find the estimated variance of the error terms of a multiple linear regression model with intercept can be obtained by summing up the squared residuals and dividing that by n-p, where n is the sample size and p is the number of predictors
False (it is n-p-1)
For a given predicting variable, the estimated coefficient of regression associated with it will likely be different in a model with other predicting variables or in the model with only the predicting variable alone
True
Observational studies allow us to make causal inference
False
In the case of multiple linear regression, controlling variables are used to control for sample bias
True
In the case of a multiple regression model with 10 predictors, the error term variance estimator follows a chi-squared distribution with n-10 degrees of freedom
False (it is n-10-1)
The estimated coefficients obtained by using the method of least squares are unbiased estimators of the true coefficients
True
Before making statistical inference on regression coefficients, estimation of the variance of the error terms is necessary
True
An example of a multiple regression model is Analysis of Variance (ANOVA)
True
Given a qualitative predicting variable with 7 categories in a linear regression model with intercept, 7 dummy variables need to be included in the model
False
It is good practice to create a multiple linear regression model using a linearly dependent set of predictor variables
False
If the confidence interval for a regression coefficient contains the value zero, we interpret that regression coefficient is definitely equal to zero
False
The larger the coefficient of determination (r-squared), the higher the variability explained by the simple linear regression model
True
The estimators of the error term variance and of the regression coefficients are random variables
True
The one-way ANOVA is a linear regression model with one qualitative predicting variable
True
We can assess the assumption of contant-variance in multiple linear regression by plotting the standardized residuals against fitted values
True
If one confidence interval in the pairwise comparison includes zero under ANOVA, we conclude that the two corresponding means are plausibly equal
True
We do not need to assume independence between data points for making inference on the regression coefficients
False
Assuming the model is a good fit, the residuals in simple linear regression have constant variance
True
We cannot estimate a multiple linear regression model if the predicting variables are linearly independent
False
If a predicting variable is categorical with 5 categories in a linear regression model without intercept, we will include 5 dummy variables in the model
True
In ANOVA, the number of degrees of freedom of the chi-squared distribution for the variance estimator is N-k-1 where k is the number of groups
False
The only assumptions for a simple linear regression model are linearity, constant variance, and normality
False
In simple linear regression, the confidence interval of the response increases as the distance between the predictor values and the mean value of the predictors decreases
False
The sampling distribution of the estimated variance of the error terms of a multiple linear regression model with k predictors and an intercept is a t-distribution with n-k-1 degrees of freedom
false (it is a chi-square dist)
The assumption of normality in simple linear regression is required for the derivation of confidence intervals, prediction intervals, and hypothesis testing
True
Outliers will always have a significant influence on the estimated slope in simple linear regression
False
In simple linear regression we can assess whether the errors in the model are correlated using the plot of residuals vs. fitted values
True
In the ANOVA test for equal means, the alternative hypothesis is that all means are not equal
True or False
If a predicting variable is a qualitative variable with three categories in a linear regression model with intercept, we should include all three dummy variables
False
The estimation of the mean response has a wider confidence interval than for the prediction of a new response
False
The assumption of normality is required in ANOVA in order to assess the null hypothesis of equal means
True
The box-cox transformation is applied to the predicting variables if the normality assumption does not hold
False
If the confidence interval of a regression coefficient does not include zero, we interpret the regression coefficient to be statistically significant
True
A nonlinear relationship between the response variable and a predicting variable cannot be modeled using regression
False
R-squared is the best measure to check if a linear regression model is a good fit to the data
False (error terms and stuff)
The F-test in ANOVA compares the between variability versus the within variability
True
In ANOVA testing, the variance of the response variable is different for each sub-population
False
If one or more of the regression assumptions does not hold, then the model does not fit the data well, thus it is not useful in modeling the response
False