Machine Learning with Viya® 3.4® Lesson 5: Support Vector Machines (SVG) and Additional Topics Flashcards
What is a dot product?
A dot product is a way to multiply vectors that result in a scalar, or a single number, as the answer. It is an element-by-element multiplication, and then a sum across the products.
How is a support vector machine constructed in order to avoid the curse of dimensionality?
By using only the observations closest to the separating hyperplane
How does using only the observations closest to the separating hyperplane avoid the curse of dimensionality?
By limiting the number of data points in the solution.
What kind of information is in the Training Results table in an SVM run?
The Training Results table shows the parameters for the final Support Vector Machine model such as the number of support vectors and the bias.
Where can you find the average square error on the VALIDATE partition?
In the Fit Statistics table on the Assessment tab.
Where can you view the misclassification matrix?
The Output Window
What are the two constraints used to solve for optimization in a support vector machine?
If the target variable equals 1, then H must be greater than or equal to 1. If the target is -1, then H must be less than or equal to -1.
What’s a term for describing data points that are not linearly separable?
soft margin hyperplane
What do you need to do when you encounter a soft margin hyperplane?
Account for errors that the separating hyperplane might make
TRUE or FALSE: When the data are not linearly separable, the process of optimizing the location of the hyperplane must account for classification errors.
TRUE: When the data are not linearly separable, the hyperplane will misclassify some data points. In this situation, the process of optimizing the location of the hyperplane must account for these classification errors.
What is a kernel function?
A kernel function operates as a dot product in a higher dimension (that is, in a feature space), but it is applied to the raw data.
Suppose you are modeling data with a binary target and three inputs. The data are linearly separable. How many possible solutions exist that classify the target?
an infinite number of solutions can classify the binary target when the data are linearly separable
What type of target variable is supported in a support vector machine in Model Studio?
Support vector machines are used exclusively with binary targets in Model Studio.
What are the elements of a classifier model for a Support Vector Machine?
The classifier model (H) has two elements: a normal vector and a bias term
What is the maximum-margin hyperplane in a two-dimensional input space?
the exact center of the thickest line that touches the innermost values of one target outcome and the innermost values of the other target outcome
What are support vectors?
Support vectors are the points in the data that are closest to the maximum-margin hyperplane.
In support vector machines, finding the separating hyperplane is an optimization problem with constraints that involve the values of the binary target.
a. True
b. False
A: True
Solving for the support vector machine is actually an optimization problem with two constraints. The first constraint is based on a target value of +1, and the second constraint is based on a target value of -1.
How is a feature space is constructed?
A feature space is constructed by applying a nonlinear transformation to data so that linear separation exists in this higher-dimensional space
What is a kernel function?
C: A kernel function is a math trick used to avoid having to calculate dot products on transformed data.
What information is provided in the Local Interpretable Model-Agnostic Explanation (LIME) plot when the Model Interpretability feature is used in Model Studio?
A LIME plot creates a localized linear regression model around a particular observation based on a perturbed sample set of data.
How is the Input Relative Importance table that appears in the results calculated when the Model Interpretability feature is used in Model Studio?
The Input Relative Importance table is calculated by depth-one decision trees using each input to estimate the predicted values of the model being interpreted.
What are the three options used to increase the flexibility of a support vector machine model in Model Studio?
Penalty, kernel, and tolerance
What is the penalty term?
The penalty is a term that accounts for misclassification errors in model optimization.
What is tolerance?
The tolerance value balances the number of support vectors and model accuracy.
Which of the following machine learning models is the easiest to interpret?
a. decision tree
b. neural network
c. support vector machine
Decision trees are highly interpretable because they are based on English rules, which are rules that use Boolean logic.
What does the Penalty value do?
The Penalty value balances model complexity and training error.
What is the risk associated with a larger Penalty value?
A larger Penalty value creates a more robust model at the risk of overfitting the training data.
What does the Tolerance value do?
The Tolerance value balances the number of support vectors and model accuracy.
What is the consequence of too large a Tolerance value?
A Tolerance value that is too large creates too few support vectors.
What is the consequence of a Tolerance value that is too small?
A Tolerance value that is too small overfits the training data.
What does an intersecting slope in an ICE Plot indicate?
An Intersecting slope indicates that there is an interaction between the plot variable and one or more complementary variables.
Why is it useful to look among clusters for different relationships between the groups (or levels) of the categorical variable and the target when evaluating an ICE plot of a categorical input?
Significant differences in these relationships indicate group effects.
Where is the largest possible margine of error in a minimum-maximum hyperplane?
This hyperplane has the largest possible margin of error on its positive and negative sides.
What does the autoencoder method on the Feature Extraction node do?
The Autoencoder method builds a neural network that uses the inputs to reconstructs the inputs.
How is an autoencoder network different than an MLP network?
An autoencoder network is like an MLP network except that its output layer is duplicated from the input layer.
When you scale the input variables for a binary target using support vector machines, what happens to the inputs?
Values are scaled to range from 0 to 1.
How are missing values for class variables handled when “Use missing” is specified for a Support Vector Machine node?
SVMs treat missing values as a separate category.
What is the only Global Interpretability plot available in Model Studio?
Partial Dependence plots are based on an aggregation across all observations, thus they provide global interpretability
Which model interpretability tools can be used to help interpret a machine learning model for a single observation?
Local Interpretable Model-Agnostic Explanations (LIME) plots, Kernel SHAP (Shapley) plots
Where does the Open Source Code node execute?
CAS
Which assessment measure should be used to determine the champion when predicting an interval target?
Average Squared Error
Which assessment measure should be used to determine the champion for a decision focused model?
Misclassification Rate
Which assessment measure would you use to determine the champion for a model used for ranking?
The ROC Index or Gini Coefficient