Linear Regression Flashcards
Backward Elimination
An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step.
Best subsets
A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables.
Coefficient of determination
A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.
Confidence interval
An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence.
Confidence level
An indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.
Cross-validation
Assessment of the performance of a model on data other than the data that were used to generate the model.
Dependent variable
The variable that is being predicted or explained. It is denoted by y and is often referred to as the response.
Dummy variable
A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.
Estimated regression
The estimate of the regression equation developed from sample data by using the least squares method.
Experimental region
The range of values for the independent variables x1, x2, . . . , xq for the data that are used to estimate the regression model.
Extrapolation
Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2,… that are outside the experimental range.
Forward selection
an iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step.
Holdout method
Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.
Hypothesis testing
The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture.
Independent variable
The variable(s) used for predicting or explaining values of the dependent variable. It is denoted by x and is often referred to as the predictor variable.