Linear Models Flashcards

Question 1

Q

Explain the difference between descriptive, predictive, and prescriptive modeling

Answer

A

Descriptive: Focuses on what happened in the past and aims to describe/explain observed trends by identifying relationships between variables
Predictive: Focuses on what will happen in the future and aims to make accurate predictions
Prescriptive: Uses a combination of optimization and simulation to investigate and quantify impacts of prescribed actions/decisions to answer “what if” questions

Question 2

Q

Explain the difference between supervised and unsupervised learning and what their goals are

Answer

A

Supervised learning: Problems where there is a target variable supervising predictive analysis. Goals are to (1) understand the relationship between the target variable and the predictors and (2) make accurate predictions for the target variable based on the predictors
Unsupervised learning: Problems where there is no target variable supervising predictive analysis. Goal is to identify relationships, structures, and patterns between different variables in the data

Question 3

Q

Explain how stratified sampling contributes to a more representative sample than random sampling

Answer

A

Stratified sampling ensures that every stratum is properly represented in the collected data. This is done by dividing the underlying population into non-overlapping groups in a non-random fashion, then randomly sampling a set number of observations from each stratum.
* Oversampling and undersampling –> designed for unbalanced data
* Systematic sampling –> draw observations according to a set pattern to arrive at pre-determined sampled observations (no random mechanism)

Question 4

Q

Explain the three data quality issues one should examine in practice

Answer

A

Reasonableness: Do the key statistics for the variables make sense?
Consistency: Are the records in the data inputted consistently?
Sufficient documentation: Can other users easily gain an understanding of different aspects of the data?

Question 5

Q

Explain the problem with target leakage in predictive analytics

Answer

A

Target leakage is when some predictors in a model leak information about the target variable that will not be available when the model is applied in practice. This causes a problem because these variables cannot serve as predictors in practice and would lead to artifially good model performance if mistakenly included.

Question 6

Q

Explain how to use a time variable to make the training/test set split and the advantage of doing so

Answer

A

A time variable can be used to make the training/test split on the basis of time. This includes allocating the older observations to the training test set and the more recent observations to the test set. This is useful for evaluating how well a model extrapoltaes time trends observed in the past to future, unseen years.

Question 7

Q

Explain what hyperparameters are and why they are important for a predictive model

Answer

A

Hyperparameters = tuning parameters, which are parameters that control some aspect of the fitting process itself

Question 8

Q

Explain the difference between bias and variance in a predictive analytics context

Answer

A

Bias = the difference between the expected value of prediction and the true value of the signal function
* Part of the test error caused by the model not being flexible enough (underfitting)
Variance = the amount of variability of prediction
* Part of the test error caused by the model being too complex (overfitting)

Question 9

Q

Explain the difference between variables and features in a predictive analytic context

Answer

A

Variables = predictors in a model. the original dataset before any transformations
Features = derivations from the original variables to provide a more useful view of the information in the dataset

Question 10

Q

Explain the difference between dimensionality and granularity

Answer

A

There are two main differences:
1. Applicability: Dimensionality is a concept specific to categorical variables. Granularity applies to both numeric and categorical variables.
2. Comparability: We can always order two categorical variables by dimension, but it is not always possible to order them by granularity.

Question 11

Q

Explain the problem with RSS and R squared as model selection measures

Answer

A

They are merely goodness-of-fit measures of a linear model to the training data. There is no explicit regard to its complexity or prediction performance

Question 12

Q

Explain the rationale behind and the difference between the AIC and BIC

Answer

A

Both AIC and BIC can be used as a model selection criterion. However, the penalty term for BIC is higher than that for the AIC. Therefore, the BIC tends to result in a simpler final model than the AIC.

Question 13

Q

Explain the advantages and disadvantages of polynomial regression

Answer

A

Pros: Polynomial regression can take care of substantially more complex relationships between the target variable and predictors than linear ones. This is because the more polynomial terms, the more flexible the fit can be
Cons: Interpretation and the choice of m. Regression coefficients in polynomial regression are more difficult to interpret. Additionally, there is no simple rule as to how to choose the value of m. However, it can be tuned by CV and EDA can also help

Question 14

Q

Explain the meaning of interaction

Answer

A

An interaction arises if the association between one predictor and the target variable depends on the value/level of another predictor

Question 15

Q

Explain how best subset selection works and its limitations

Answer

A

Best subset selection is performed by fitting all p models, where p is the total number of predictors being considered, that contain exactly one predictor and picking the model with smallest deviance, fitting all p choose 2 models that contain exactly 2 predictors and picking the model with lowest deviance, and so forth. Then a single best model is selected from the models picked, using a metric such as AIC. In general, there are 2^p models that are fit, which can be quite a large search space as p increases. (Note: global minimum).

Question 16

Q

Explain how stepwise selection works and how it addresses the limitations of best subset selection

Answer

A

Stepwise selection is an alternative to best subset selection, which is computationally more efficient, since it considers a much smaller set of models. For example, forward stepwise selection begins with a model containing no predictors, and then adds predictors to the model, one-at-a-time, until adding a predictor leads to a worse model by a measure such as AIC. At each step the predictor that gives the greatest additional improvement to the fit is added to the model. The best model is the one fit just before adding a variable leading to a decrease in performance. It is not guaranteed that stepwise selection will find the best possible model out of all 2^p models. (Note: local minimum).

Question 17

Q

Explain why it is not a good idea to add or drop multiple features at a time when doing stepwise selection

Answer

A

Because the significance of a feature can be significantly affected by the presence or absence of other features due to their correlations. For ex., a feature can be significant on its own, but become insignificant in the presence of another feature.

Question 18

Q

Explain the difference in the model fitting processes between stepwise selection and regularization

Answer

A

Stepwise selection and regularization can both be used to reduce the complexity of a linear model. Stepwise selection goes through a list of candidate models fitted by OLS to decide on a final model with respect to a certain selection criterion. The regression coefficients of the non-predictive features are then set to zero. Regularization considers only a single model hosting all potentially useful features. Instead of OLS, we fit the model using unconventional methods to shrink the coefficient estimates toward zero. The non-predictive features will then have a weaker association with the target variable, and in some instances, become exactly zero, to be dropped from the model.

Question 19

Q

Explain how the regularization parameter λ affects a regularized model

Answer

A

The regularization parameter λ quantifies the trade-off between model fit and model complexity.
* λ = 0, the regularization penalty vanishes and the coefficient estimates equal OLS estimates
* λ increases, there is increasing pressure for the coefficient estimates to be closer to zero. The flexibility of the model drops, which leads to decreased variance and an increased bias.
* λ –> infinity, coefficient estimates are all zero, becoming the intercept-only model.

Question 20

Q

Explain why λ and 𝛼 are hyperparameters of a regularized model and how they are typically selected

Answer

A

λ and 𝛼 are hyperparameters, which are pre-specified inputs that go into the model fitting process and are not determined as part of the optimization procedure. They are typically selected by cross-validation. This is done by constructing a fine grid of values of (λ,𝛼) in advance, computing the cross-validation error for each pair of values of (λ,𝛼), and choosing the pair that produces the lowest cross-validation error

Question 21

Q

Stage 1: Define the Business Problem. What are the objectives and constraints?

Answer

A

Objectives: Is the objective prediction-focused or interpretation-fouced?
Constraints: What is the availability of easily accessible and high quality data? Are there any implementation issues?

Question 22

Q

Stage 2: Data Collection. Data Design stage; Explain the relevance of data and the importance of data source

Answer

A

Other things equal, having more data is generally desirable since more information is available and it makes model training more robust and less vulnerable to noise.
It’s necessary to source the data from the right population and time frame
* Population: Data source should be a reasonably good proxy of a representative sample with the true population
* Time frame: The time period chosen should reflect the business environment in which we will be implementing our models

Question 23

Q

What is sufficient documentation of a dataset?

Answer

A

A description of the dataset overall, including the data source
A description of each variable in the data, including its name, definition, and format
Notes about any past updates or other irregularities of the dataset
A statement of accountability for the correctness of the dataset
A description of the governance processes used to manage the dataset

Question 24

Q

What are other data issues related to the collection and use of data?

Answer

A

PII - personally identifiable information. It’s important to comply with laws, regulations, and standards of practice pertaining to personal data. anonymization (de-identify data), data security (encryption and access/transfer restrictions), and terms of use (be aware of terms and conditions and privacy policy related to collection and use of data)
Variables causing unfair discrimination/sensitive information such as race, ethnicity, or gender. Differential treatment may lead to unfair discrimination and could be deemed unethical when using them as predictors.
Target leakage

Question 25

Q

What are some considerations in selecting the best model?

Answer

A

Prediction performance
Interpretability
Ease of implementation

Question 26

Q

What are 3 ways to reduce the dimensionality of a categorical predictor?

Answer

A

Combine sparse categories with others that exhbit a similar behavior
Combine similar categories (w.r.t mean or median)
Use prior knowledge (for ex. grouping hour of the day into early, morning, afternoon, and evening)

Question 27

Q

Explain separately what an R squared of 0 and 1 indicates for a linear model

Answer

A

R squared = 0: this implies that RSS=TSS which in turn means that the fitted linear model is essentially the intercept only model. The predictors collectively bring no useful information for understanding the target variable
R squared = 1: this implies that RSS = 0, which in turn means that the model perfectly fits each training observation. Although the goodness of fit to the training set is perfect, this model has probably overfitted the data and may not do well on future, unseen data

Question 28

Q

Compare the bias and variance of a quadratic regression model and a cubic regression model

Answer

A

The cubic regression model is more complex than the quadratic regression model. This results in the cubic model having a lower squared bias but a higher variance. The additional degree of freedom coming from the cubic term provides the model with greater flexibility, but makes the model more vulnerable to overfitting

Question 29

Q

Suggest two ways to choose between two polynomial regression models

Answer

A

We can compare the AIC of the two models and choose the one with the lower AIC
We can use cross validation and choose the model with the lower cross validation error

Question 30

Q

What are the methods for handling non-linearity?

Answer

A

Polynomial regression
Binning - piecewise constant functions
Piecewise linear functions

Question 31

Q

What is collinearity?

Answer

A

When two or more features are closely, if not exactly, linearly related. An example of when perfect collinearity exists is when a model includes the dummy variables of all levels of a categorical predictor, which means the dummy variables always sum to 1

Question 32

Q

What is the problem with collinearity?

Answer

A

The presence of collinear variables means that some of the variables do not bring much additional information because their values can be largely deduced from the values of other variables, leading to redundancy
The interpretation of coefficient estimates becomes difficult and it is hard to separate the individual effects of the target variable

Question 33

Q

How can we handle collinearity?

Answer

A

Delete one of the problematic predictors causing collinearity. Due to their strong linear relationship, the deletion is going to cause a minimial impact on the model (judgement call)
Pre-process the data using dimension reduction techniques such as PCA to combine collinear predictors into a much smaller number of predictors which are far less correlated with each other and capture different kinds of information in the data

Question 34

Q

List 3 differences bewteen stepwise selection and regularization

Answer

A

Stepwise selection uses the number of features as a direct measue of model complexity. Regularization uses the regularization paramater as an indirect measure of model complexity
For stepwise selection, the entire categorical predictor with all levels are added or dropped as the algorithim iterates, unless we manually binarize the categorical predictors in advance. Regularization automatically binarizes the categorical predictors into its factor levels
For stepwise selection, numeric predictors are left intact without standardization and whether or not they are standardized has no impact on the fitted model. For regularization, numeric predictors are typically standardized (dividing each predictor by their standard error) so that they are on a common scale when the model is fitted