Machine Learning with Viya® 3.4® Lesson 4: Neural Networks Flashcards by Nicole Fox

What is the neural network term for a parameter estimate or slope?

Weight estimate

How well did you know this?

Not at all

Perfectly

What is the mathematical transformation that is applied to a linear combination of the input values in a MLP or Neural Network referred to as?

activation function

How well did you know this?

Not at all

Perfectly

Which neural network architecture is best for modeling nonstationary data?

A skip-layer perceptron is the best architecture to use with nonstationary data.

How well did you know this?

Not at all

Perfectly

Which hyperparameters or set of hyperparameters in Model Studio are used to control weight decay?

the L1 and L2 hyperparameters control weight decay

How well did you know this?

Not at all

Perfectly

What is the purpose of the Minibatch size option in a Neural Network mode?

It defines the number of training observations to calculate the model error and update the model coefficients.

How well did you know this?

Not at all

Perfectly

Can neural networks can select inputs in a similar fashion as a tree-based model?

neural network cannot select inputs like a tree-based model

How well did you know this?

Not at all

Perfectly

To which optimization method does the minibatch option in a Neural Network apply?

The Minibatch size option defines the number of training observations to use in the SGD instead of using all training observations.

How well did you know this?

Not at all

Perfectly

Why is a neural network referred to as universal approximators?

Neural networks are called universal approximators because they can model any input-output relationship, no matter how complex.

How well did you know this?

Not at all

Perfectly

What terms refers to a parameter estimate or slope that is associated with an input in a neural network?

Weight estimate is the neural network term for a parameter estimate or slope.

How well did you know this?

Not at all

Perfectly

Which activation function is commonly used in the target layer when modeling a binary target?

The logistic function is the target layer activation function (or target layer link function) that is typically used with a binary target.

How well did you know this?

Not at all

Perfectly

What is the limitation of modeling with a neural network?

Neural networks are generally considered to be “black boxes.” Because they are minimally interpretable, at best, neural networks are most useful in pure prediction scenarios.

How well did you know this?

Not at all

Perfectly

What is the value of a standardized variable called?

Z-score or standard score

How well did you know this?

Not at all

Perfectly

What is standardization?

rescaling your data to have a mean of 0 and a standard deviation of 1

How well did you know this?

Not at all

Perfectly

When is the identity target activation function appropriate when modeling a neural network?

when the target error function is normally distributed

How well did you know this?

Not at all

Perfectly

What is the neural network term for an intercept estimate?

Bias estimate

How well did you know this?

Not at all

Perfectly

When early stopping is used to build a neural network model, which data partition does Model Studio use to select the final model?

The validate partition

How well did you know this?

Not at all

Perfectly

What may be affected when multi-scaled variables are used in a multivariate analysis?

Model stability and parameter estimate precision are influenced during multivariate analysis when multi-scaled variables are used (i.e., a variable that ranges between 0 and 100 will outweigh a variable that ranges between 0 and 1.)

How well did you know this?

Not at all

Perfectly

What occurs during a neural network’s learning process?

Numerical optimization is an important part of the learning process.

How well did you know this?

Not at all

Perfectly

Why might you apply weight decay when building a neural network model?

Weight decay is one of two main methods used to avoid overfitting when building a neural network model

How well did you know this?

Not at all

Perfectly

What is the partition validation method?

With partition, you specify proportions to use for randomly assigning observations to each role.

How well did you know this?

Not at all

Perfectly

The Neural Network node can use weight decay to avoid overfitting. How are the L1 and L2 regularizations applied?

L1 penalizes the absolute value of the weights.

L2 penalizes the squared weights.

How well did you know this?

Not at all

Perfectly

What is an activation function?

a mathematical transformation that is applied to a linear combination of the input values.

How well did you know this?

Not at all

Perfectly

What does it mean to normalize your data?

Study These Flashcards

Normalizing your data refers to rescaling numeric data between 0 and 1 where xmin is the variable’s minimum value, and xmax is the variable’s maximum value.

How are missing observations handled when the box is checked to include them in a Neural Network mode?

Study These Flashcards

Observations with missing values for interval inputs will be imputed with the mean and included.

What are optimization methods used for?

Optimization methods are used to **efficiently search the complex landscape of the error surface to find an error minimum**.

Which neural network architecture is best for modeling data with discontinuous input-output mappings?

a multilayer perceptron with two hidden layers (adding a second hidden layer can improve performance by enabling the MLP to realize discontinuous input-output mappings)

Is model generalization depends more on the number of weights or the magnitude of the weights when modeling a neural network?

Model generalization depends more on the magnitude of the weights than on the number of weights. In fact, large weights are responsible for overfitting.

What techniques do deep learning models use to overcome the computational challenges associated with multiple hidden layers?

Deep learning models use fast moving gradient-based optimizations, such as Stochastic Gradient Descent, for this purpose.

Which hyperparameters or set of hyperparameters in Model Studio controls weight decay?

The **L1 and L2** hyperparameters control weight decay.

How is early stopping performed in SAS VDMML?

SAS Visual Data Mining and Machine Learning **treats each iteration in the optimization process as a separate model**. **The iteration with the smallest value of the selected fit statistic is chosen as the final model.** This method of model optimization is also called stopped training.

What is the global minimum?

A global minimum is **a set of weights that generates the smallest amount of error**.

What are the two steps important in the learning process for a neural network?

1. find a good set of parameters that minimizes the error (avoid bad local minima) 2. ensure that this set of parameters performs well in new data sets

What does the **Number of tries** property specify?

the Number of tries property specifies **the number of times the network is to be trained using a different starting point**

What does early stopping prevent?

Early stopping **keeps the sigmoids from becoming too steep** (and steep sigmoids are thought to be responsible for overfitting)

What property would you adjust to specify whether or not to stop training when the model begins to overfit?

**Perform Early Stopping**

Which properties affect the number of iterations performed during model optimization?

**Maximum iterations** and **Maximum time**

What are the methods available to prevent overfitting in the **Perform Early Stopping** property?

1. **Stagnation –** t**raining stops after N consecutive iterations (Stagnation) without improvement in the *validation* partition** (Note: Early stopping cannot be used if there is no validation partition. The default value is 5.) 2. **Validation error goal – Specifies a goal for early stopping based on the validation error rate. When the error gets below this value, the optimization stops.** This option is in effect only for networks with fewer than 6 hidden layers. The value of 0 indicates that no validation error goal is set. The default value is 0.

Describe the architecture of a Multilayer perceptron model.

The first layer, called the input layer, connects to a layer of neurons called a hidden layer, which, in turn, connects to a final layer called the target or output layer.

Which of the two optimization methods will be chosen when the Optimization method is set to **Automatic**?

The optimization method selected is based on the number of hidden layers: **For 2 or fewer hidden layers, LBFGS is used;** ## Footnote **otherwise, SGD is used.**

In Model Studio, which two optimization methods are currently available in the Neural Network node?

limited memory Broyden-Fletcher-Goldfarb-Shanno (**LBFGS**) and **variants of gradient descent**

Which optimization method is well suited for optimization problems with a large number of variables?

LBFGS

Which iteration does the optimal model come from when early stopping is used?

one of the iterations that occur **earlier** than the final training iteration

Which data partition is used to select the final model when early stopping is used to build a neural network model?

validate

Neural Network equation

The predicted estimate, ŷ, is a weighted sum of the input features x0 to xp, weighted by the learned coefficients, or weights, w0 to wp.

What does the Node Score Code window provide?

Where is code to train the model based on different data sets or in different platforms located?

The training code window

What shows the model’s performance based on the valid error throughout the training process when new iterations are added to achieve the final model?

The iteration plot

How would you request a skip-layer perceptron in Model Studio?

In Model Studio, skip-layer perceptrons are constructed when the property Allow direct connections between input and target neurons, located on the Options tab, is selected

How many hidden layers may be used in Model Studio?

The possible number of hidden layers in SAS Visual Data Mining and Machine Learning ranges from 0 to 10.

What is the Bernoulli function?

An error function for a binary target produced by minimizing the error which is -2 times the log-likelihood.

What is Model Stidio's default hidden unit activation function in a Neural Network node?

hyperbolic tangent (tanh)

What is a global minimum?

a set of weights that generates the smallest amount of error

What is the goal of numerical optimization?

to minimize the error function

How might you refer to local features that make these algorithms vulnerable to getting stuck in treacherous regions of the error surface that result in little improvement?

error plateaus or local minima

The decision to stop is based on meeting at least one of what three convergence criteria?

1. The specified error function stops improving. 2. The gradient has no slope (in other words, the rate of change of the error function with respect to the weights is zero). 3. The magnitude of the weights stops changing substantially.

Machine Learning with Viya® 3.4® Lesson 4: Neural Networks Flashcards

(56 cards)