Machine Learning with Viya® 3.4® Lesson 2: Data Preparation Flashcards by Nicole Fox

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your local machine. In which tab would this data set be located?

Data sets that are located on your local machine can be found in Import.

How well did you know this?

Not at all

Perfectly

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your connected server (not the local machine). In which tab would this data set be located?

All data sets that exist on a connected server are found in Data Sources.

How well did you know this?

Not at all

Perfectly

What are the four groups of Advanced project settings in Model Studio?

Advisor Options
Partition Data
Event-based Sampling
Node Configuration

How well did you know this?

Not at all

Perfectly

What are the default proportions when data is partitioned for event-based sampling?

60% Training
30% Validation
10% Test

How well did you know this?

Not at all

Perfectly

Where would you adjust the threshold for rejecting inputs with missing values?

Adjust the maximum percent missing under the Advisor Options group in the Advanced Project Settings

How well did you know this?

Not at all

Perfectly

What is the validation data that the Variable Selection node creates from the training data used for?

Model assessment during the modeling process

How well did you know this?

Not at all

Perfectly

The Data Exploration node enables you to do what?

View the most Important inputs or Screening to see suspicious variables.

How well did you know this?

Not at all

Perfectly

What is the best practice for handling high-cardinality input variables?

binning

How well did you know this?

Not at all

Perfectly

Which Model Studio setting determines whether a numeric input is designated as interval or nominal?

If a numeric input has more distinct values/levels than the interval cutoff value, it is declared interval. Otherwise, it is declared nominal.

How well did you know this?

Not at all

Perfectly

Where would you specify the interval cut-off in Model Studio?

In the Advisor Options group under Advanced Project settings

How well did you know this?

Not at all

Perfectly

How would you define variable metadata and assign rules to modify variables?

You can perform these tasks using either the Data tab or the Manage Variables node.

How well did you know this?

Not at all

Perfectly

What does the temporary table produced in a Save Data node following a Decision Tree contain?

Following a decision tree, the table contains predicted probabilities and leaf IDs

How well did you know this?

Not at all

Perfectly

What can you do using the Manage Variables node after a pipeline is run?

Set up imputation and transformation rules.

How well did you know this?

Not at all

Perfectly

What does the Manage Variables node enable you to do in Model Studio?

modify the data such as changing the role of a variable or adding transformations to the data within a pipeline, turn on event-based sampling before a pipeline is run

How well did you know this?

Not at all

Perfectly

Which variable selection technique identifies the set of input variables that jointly explain the maximum amount of variance contained in the data when using the Variable Selection node in Model Studio?

Unsupervised Selection

How well did you know this?

Not at all

Perfectly

What does the Text Mining node do?

Creates topics based on groups of terms that occur together in several documents

How well did you know this?

Not at all

Perfectly

What is the purpose of the Feature Extraction node in Model Studio?

Study These Flashcards

the Feature Extraction node transforms the existing features (variables) into a lower-dimensional space by generating new features that are composites of the original features

What is the drawback to Feature Extraction?

Study These Flashcards

Composite variables are no longer meaningful with respect to the original problem

What is another term for a feature in predictive modeling?

Study These Flashcards

Input

How would you specify the threshold for rejecting categorical variables in Model Studio?

Study These Flashcards

Set the maximum class levels under the Advisor Options group in Advanced Project settings

Where can you set up imputation and transformation rules in Model Studio?

Study These Flashcards

In the Manage Variables node after a pipeline is run

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set loaded into memory. In which tab would this data set be located?

Study These Flashcards

CAS tables loaded into memory are seen in Available.

Does the Variable Selection node use supervised methods or unsupervised methods to select inputs?

Study These Flashcards

The Variable Selection node can perform input selection based on both supervised and unsupervised methods.

What is the curse of dimensionality?

Study These Flashcards

The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target

Why it is important to reduce the number of inputs during data preparation?

The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target

What does the Save Data node do?

The Save Data node produces a temporary table in a CAS library.

What does the Replacement node enable you to do?

replace outliers and unknown class values with specified values

How do the transformations available in the Transformations node minimize bias in model predictions?

by reducing the effect of extreme or unusual input values

What are some of the techniques used in the Feature Extraction node in Model Studio?

principal component analysis (PCA), robust PCA, singular value decomposition (SVD), and autoencoders

What is binning?

Binning is a method of transformation that converts numeric inputs to categories or groups the levels of a high-cardinality input.

Which transformation creates bins for a numeric variable?

A quantile transformation creates bins for a numeric variable.

What would you use the Anomaly Detection node in Model Studio for?

the Anomaly Detection node is used to identify and exclude anomalies using the support vector data description (SVDD)

When would it be helpful to use the Anonamly Detection node?

when using a data set where most of the data belongs to one class and the other class is scarce or missing

What does the Filtering node do?

The Filtering node excludes certain observations such as rare or extreme values

Assume the Target has an event proportion of 2% in the original data. You want to build models where event-based sampling has been used such that the modeling data set will have a 50% event proportion. What are the two ways this can be done using Model Studio?

1. While the project is being created, after the data source has been selected, click the Advanced button. Select the Event-Based Sampling option. Turn on event-based sampling by checking the check box. 2. **After the project is created but before a pipeline has been run**, go into project settings, select the Event-Based Sampling option. Turn on event-based sampling by checking the check box.

What are some reasons for performing transformations on your data?

stabilizing variances, removing non-linearity, and correcting non-normality

What is the default maximum cardinality for determining whether or not to reject a nominal variable?

Is the **Maximum percent missing** option is turned on by default?

Yes

What is **Singular Value Decomposition** (SVD)?

Singular value decomposition(SVD) projects the high-dimensional document and term spaces into a lower dimension space.

What is SVD used for when selecting model inputs?

The singular values can be thought of as providing a measure of importance used to decide how many dimensions to keep.

What does the Transformation node enable you to do?

alter your data by replacing an input variable with some function of that variable

Machine Learning with Viya® 3.4® Lesson 2: Data Preparation Flashcards

Prepare for SAS Machine Learning Specialist Exam (41 cards)