Machine Learning with Viya® 3.4® Lesson 2: Data Preparation Flashcards
Prepare for SAS Machine Learning Specialist Exam
For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your local machine. In which tab would this data set be located?
Data sets that are located on your local machine can be found in Import.
For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your connected server (not the local machine). In which tab would this data set be located?
All data sets that exist on a connected server are found in Data Sources.
What are the four groups of Advanced project settings in Model Studio?
- Advisor Options
- Partition Data
- Event-based Sampling
- Node Configuration
What are the default proportions when data is partitioned for event-based sampling?
- 60% Training
- 30% Validation
- 10% Test
Where would you adjust the threshold for rejecting inputs with missing values?
Adjust the maximum percent missing under the Advisor Options group in the Advanced Project Settings
What is the validation data that the Variable Selection node creates from the training data used for?
Model assessment during the modeling process
The Data Exploration node enables you to do what?
View the most Important inputs or Screening to see suspicious variables.
What is the best practice for handling high-cardinality input variables?
binning
Which Model Studio setting determines whether a numeric input is designated as interval or nominal?
If a numeric input has more distinct values/levels than the interval cutoff value, it is declared interval. Otherwise, it is declared nominal.
Where would you specify the interval cut-off in Model Studio?
In the Advisor Options group under Advanced Project settings
How would you define variable metadata and assign rules to modify variables?
You can perform these tasks using either the Data tab or the Manage Variables node.
What does the temporary table produced in a Save Data node following a Decision Tree contain?
Following a decision tree, the table contains predicted probabilities and leaf IDs
What can you do using the Manage Variables node after a pipeline is run?
Set up imputation and transformation rules.
What does the Manage Variables node enable you to do in Model Studio?
modify the data such as changing the role of a variable or adding transformations to the data within a pipeline, turn on event-based sampling before a pipeline is run
Which variable selection technique identifies the set of input variables that jointly explain the maximum amount of variance contained in the data when using the Variable Selection node in Model Studio?
Unsupervised Selection
What does the Text Mining node do?
Creates topics based on groups of terms that occur together in several documents