6 Predictive Analytics: Regression and Artificial Neural Networks Flashcards

Question 1

Q

Question 1
Level: medium
Which of the following statements is FALSE?
a) One versus all coding results in less models than one versus one coding.
b) Applying stepwise variable selection will always result in a logistic regression model with less variables than forward variable selection.
c) A logistic regression model can be regarded as a linear regression model, with the output being transformed by means of a sigmoid function to obtain a probability estimate between zero and one.
d) A logistic regression model can be regarded as a linear regression model that is fitted to estimate a transformed outcome variable, i.e., instead of estimating the binary target variable (positive or negative outcome), the continuous, normally distributed log-odds (the logarithm of the ratio of the probability to be positive and the probability to be negative) are estimated.

Answer

A

b) Applying stepwise variable selection will always result in a logistic regression model with less variables than forward variable selection.

notnecesarrily; Forward selection = Starts from empty model and always adds variables based on low p-values
Backward elimination = Starts from full model and always removes variables based on high p-values
Stepwise = Starts as forward selection, but checks whether added variables can be removed later

Question 2

Q

Question 2
Level: difficult
Which of the following statements is FALSE? Give answer D if all statements are true.
a) Various convergence criteria can be adopted for deciding when to stop adjusting the weights when training a neural network, e.g., when the error function shows no progress, when the weight parameter estimates stop changing substantially, when the gradient is close to one. Alternatively, a fixed number of epochs may be adopted.
b) Early stopping can be adopted when training a neural network to avoid overfitting. By setting aside a validation set, we can observe when the network starts overfitting, i.e., when the performance on the validation set starts decreasing. At that moment, we stop adjusting the weights.
c) When adopting early stopping as well as when adopting hyperparameter tuning for determining the optimal number of hidden neurons, we can adopt an evaluation metric that differs from the objective function, i.e., the error-function that is used in the backpropagation training for adjusting the weights.
d) All of the above statements are true.

Answer

A

a) Various convergence criteria can be adopted for deciding when to stop adjusting the weights when training a neural network, e.g., when the error function shows no progress, when the weight parameter estimates stop changing substantially, when the gradient is close to one. Alternatively, a fixed number of epochs may be adopted.
-> when gradient is close to 0 (function cahnges slowly in direction of gradient) not 1 (rapid chaange, steep slope)

Question 3

Q

Question 1
What is the objective function in logistic regression?

Answer

A

objectivefunction= Maximum likelihood estimation!
= maximizing the probability of getting the sample at hand

In logistic regression, the objective function is typically the cross-entropy loss, which measures the difference between the predicted probability of the positive class and the actual label.

The logistic loss penalizes the model more when the predicted probability diverges from the actual class labels. The goal during training is to minimize the average logistic loss across all training examples.

-> for regression ; MSE,mean squaared error
-> for classification; cross entropy

Question 4

Q

Question 2
How does logistic regression relate to linear regression, and how is it different?

Answer

A

Logistic regression =Linear regression with a transformation such that the output is always between 0 and 1 and can thus be interpreted as a probability

Similarities;
- assumption linear relationship
- have to estimate parameters to min objective function

Differences
- nature dependent variable(y) /application; Linear = continous, logistic= binary
- output; Linear= linear combination of input values, Logistics= probababilty between 0 and 1
- interpretation; Linear= change for one unit change in X, Logistic= change in log odds probability for one unit change in X
- Loss function (difference predicted cost and actual cost should be minimized); Linear= uses MSE, Logistics= cross-entropy loss

Question 5

Q

Question 3
How does the decision boundary of a logistic regression look like? Can you explain?

Answer

A

Decision Boundary, that separates the data points into specific classes, where the algorithm switches from one class to another. On one side a decision boundary, a datapoints is more likely to be called as class A — on the other side of the boundary, it’s more likely to be called as class B.

The goal of logistic regression, is to figure out some way to split the datapoints to have an accurate prediction of a given observation’s class using the information present in the features.

Examining decision boundaries is a great way to learn how the training data we select affects performance and the ability for our model to generalize.

Question 6

Q

Question 4
How to interpret the coefficients of a logistic regression model?

Question 7

Q

Question 5
Why and how to select variables in a logistic regression model?

Answer

A

Perform a hypothesis test (Wald test) to decide upon VARIABLE IMPORTANCE: (small p= important variable)
H0: βi = 0
H1: βi ≠ 0

Can be used in different ways;
- Forward selection = Starts from empty model and always adds variables based on low p-values
- Backward elimination = Starts from full model and always removes variables based on high p-values
- Stepwise = Starts as forward selection, but checks whether added variables can be removed later

Question 8

Q

Question 6
Can weights-of-evidence be used in combination with logistic regression? Does that make sense? Why?

Answer

A

Yes
In certain situations yes;
Using WoE with logistic regression enhances model interpretability, handles non-linearity (aka grouping continuaal values in intervals), and provides a valuable approach for addressing specific challenges in binary classification problems.

Question 9

Q

Question 7
How do coding schemes for multiclass classification work?

Answer

A

Coding schemes: map multiclass classification to a set of binary classification problems
* One versus One coding;
Better handling of imbalanced datasets.
Each classifier focuses on distinguishing between two specific classes.
* One versus All coding;
Fewer classifiers for a large number of classes.
Simplicity and efficiency.
* Minimum Output coding
represent each class with a unique binary code
Extra bits to allow correction for misestimates (error correcting output codes)

Question 10

Q

Question 8
What is a multilayer perceptron, and how does it generalize upon logistic regression?

Answer

A

Multilayer Perceptron (MLP); type of architecture (input, hidden-, output layer)
Multiple hidden layers possible w no feedback connections,
or possibility skip layer aka no hidden layer
Every neuron has bias input
Combination functions can be linear, additive terms (bias), equal slopes
activation functions: logistics (sigmoid), hyperbolic tangent (tanh), linear

Question 11

Q

Question 9
What is an error function and how is it used to train a neural network?

Answer

A

= error function= loss function= measures the difference between the predicted outputs of a neural network and the actual target values.
= determine weights= learning
error function= objective function (differecne target & prediction), defines surface in WEIGHT SPACE

-> simple statistical model, use closed-form solutions for optimum parameter estimates
-> nonlinear models, determine parameters numerically via iterative optimization algorithm

Question 12

Q

Question 10
How can overfitting be avoided when training a neural network?

Answer

A

Avoiding overfitting using early stopping (start from oversized network/ enough hidden neurons)
Avoiding overfitting using regularization (Barlett, 1997) (generalization depends more on the size of the weights than the number of weights aka penalize large weights) (lamda parameter needs to trade-off between over and underfitting, tune lamda via validation set)
Avoiding overfitting using input selection with pruning (Prune the input where the input-to-hidden layer weights are closest to zero)
Hinton diagrams
Avoiding overfitting using input selection with a wrapper approach (flexible backward or forward selection procedure)

Question 13

Q

Question 11
What is regularization?

Answer

A

Bartlett (1997) demonstrated that generalization depends more on the size of the weights than the number of weights
* A large network with small weights acts like a smaller, less complex network
* Thus, restraining the weights should prevent a bumpy overfitted surface * However, it might also prevent the model from adapting
to true features of the data
* Penalize large weights (in absolute sense) in the objective function!

The decay (shrinkage, smoothing) parameter l controls the severity of the penalty.
* Trade-off:
* Setting lamda too low might cause overfitting (variance aka noise)
* Setting lamda too high might cause underfitting (bias aka unable to capture model)
* Tune lamda using a separate validation set
Bias – Variance trade- off!
* Same principle used by Ridge/Lasso regression (extensions to linear regression, adopting regularization)

Question 14

Q

Question 12
How can variable selection be performed when developing a neural network model for classification?

Answer

A

Forward selection
Starts from empty model and always adds variables based on low p-values
Backward elimination
Starts from full model and always removes variables based on high p-values
Stepwise
Starts as forward selection, but checks whether added variables can be removed later