Linear Regression Flashcards

Question 1

Q

Backward Elimination

Answer

A

An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step.

Question 2

Q

Best subsets

Answer

A

A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables.

Question 3

Q

Coefficient of determination

Answer

A

A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.

Question 4

Q

Confidence interval

Answer

A

An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence.

Question 5

Q

Confidence level

Answer

A

An indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.

Question 6

Q

Cross-validation

Answer

A

Assessment of the performance of a model on data other than the data that were used to generate the model.

Question 7

Q

Dependent variable

Answer

A

The variable that is being predicted or explained. It is denoted by y and is often referred to as the response.

Question 8

Q

Dummy variable

Answer

A

A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.

Question 9

Q

Estimated regression

Answer

A

The estimate of the regression equation developed from sample data by using the least squares method.

Question 10

Q

Experimental region

Answer

A

The range of values for the independent variables x1, x2, . . . , xq for the data that are used to estimate the regression model.

Question 11

Q

Extrapolation

Answer

A

Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2,… that are outside the experimental range.

Question 12

Q

Forward selection

Answer

A

an iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step.

Question 13

Q

Holdout method

Answer

A

Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.

Question 14

Q

Hypothesis testing

Answer

A

The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture.

Question 15

Q

Independent variable

Answer

A

The variable(s) used for predicting or explaining values of the dependent variable. It is denoted by x and is often referred to as the predictor variable.

Question 16

Q

Interaction

Answer

A

The relationship between the dependent variable and one independent variable is different at different values of a second independent variable.

Question 17

Q

Interval estimation

Answer

A

The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.

Question 18

Q

K-fold cross-validation

Answer

A

Method of cross-validation in which sample data set are randomly divided into k equal sized, mutually exclusive and collectively exhaustive subsets. In each of k iterations, one of the k subsets is used to build a candidate model and the remaining k - 1 sets are used evaluate the candidate model.

Question 19

Q

Knot

Answer

A

The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model; also called the breakpoint or the joint.

Question 20

Q

Least squares method

Answer

A

A procedure for using sample data to find the estimated regression equation.

Question 21

Q

Leave-one-out cross-validation

Answer

A

Method of cross-validation in which candidate models arerepeatedly fit using n - 1 observations and evaluated with the remaining observation.

Question 22

Q

Linear regression

Answer

A

Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line.

Question 23

Q

Multicollinearity

Answer

A

The degree of correlation among independent variables in a regression model.

Question 24

Q

Multiple linear regression

Answer

A

Regression analysis involving one dependent variable and more than one independent variable where the relationship is depicted by a flat hyperplane

Question 25

Q

Overfitting

Answer

A

Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population.

Question 26

Q

p-value

Answer

A

The probability that a random sample of the same size collected from the same population using the same procedure will yield stronger evidence against a hypothesis than the evidence in the sample data given that the hypothesis is actually true.

Question 27

Q

Parameter

Answer

A

A measurable factor that defines a characteristic of a population, process, or system.

Question 28

Q

Piecewise linear regression model

Answer

A

Regression model in which one linear relationship between the independent and dependent variables is fit for values of the independent variable below a prespecified value of the independent variable, a different linear relationship between the independent and dependent variables is fit for values of the independent variable above the prespecified value of the independent variable, and the two regressions have the same estimated value of the dependent variable (i.e., are joined) at the prespecified value of the independent variable.

Question 29

Q

Point estimator

Answer

A

A single value used as an estimate of the corresponding population parameter.

Question 30

Q

Quadratic regression modrl

Answer

A

Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model

Question 31

Q

Random variable

Answer

A

The outcome of a random experiment (such as the drawing of a random sample) and so represents an uncertain outcome.

Question 32

Q

Regression analysis

Answer

A

A statistical procedure used to develop an equation showing how the variables are related.

Question 33

Q

Regression model

Answer

A

The equation that describes how the dependent variable y is related to independent variables x_i and an error term

Question 34

Q

Residual

Answer

A

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation

Question 35

Q

Simple linear regression

Answer

A

Regression analysis involving one dependent variable and one independent variable.

Question 36

Q

Stepwise selection

Answer

A

an iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step.

Question 37

Q

T-test

Answer

A

Statistical test based on the Student’s t probability distribution that can be used to test the hypothesis that a regression parameter βj, is zero; if this hypothesis is rejected, we conclude that there is a regression relationship between the jth independent variable and the dependent variable.

Question 38

Q

Training set

Answer

A

The data set used to build the candidate models.

Question 39

Q

Validation set

Answer

A

The data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable.