Final Exam Flashcards
What is backwards elimination?
An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step.
What is the best subset?
A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables.
What is the coefficient of determination?
A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.
What is the confidence interval?
An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence.
What is cross validation?
Assessment of the performance of a model on data other than the data that were used to generate the model
What is the dependent variable?
The variable that is being predicted or explained. It is denoted by y and is often referred to as the response.
What is a dummy variable?
A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.
What is estimated regression?
The estimate of the regression equation developed from sample data by using the least squares method.
What is the experimental region?
The range of values for the independent variables , ,…, for the data that are used to estimate the regression model.
What is extrapoltation?
Prediction of the mean value of the dependent variable y for values of the independent variables , ,…, that are outside the experimental range.
What is forward selection?
An iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step.
What is the holdout method?
Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.
What is hypothesis testing?
The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture.
What are independent variables?
The variable(s) used for predicting or explaining values of the dependent variable. It is denoted by x and is often referred to as the predictor variable.
What is Interaction?
Regression modeling technique used when the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.
What is interval estimation?
The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.
What is a knot?
The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model; also called the breakpoint or the joint.
What is the least squares method?
A procedure for using sample data to find the estimated regression equation.
What is multicollinearity?
The degree of correlation among independent variables in a regression model.
What is linear regression?
Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line
What is multiple linear regression?
Regression analysis involving one dependent variable and more than one independent variable.
What is overfitting?
Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population.
What is the p-value?
The probability that a random sample of the same size collected from the same population using the same procedure will yield stronger evidence against a hypothesis than the evidence in the sample data given that the hypothesis is actually true.
What is the parameter?
A measurable factor that defines a characteristic of a population, process, or system.