Machine Learning with Viya® 3.4® Lesson 2: Data Preparation Flashcards

Prepare for SAS Machine Learning Specialist Exam

1
Q

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your local machine. In which tab would this data set be located?

A

Data sets that are located on your local machine can be found in Import.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set that is not in memory but exists on your connected server (not the local machine). In which tab would this data set be located?

A

All data sets that exist on a connected server are found in Data Sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four groups of Advanced project settings in Model Studio?

A
  1. Advisor Options
  2. Partition Data
  3. Event-based Sampling
  4. Node Configuration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the default proportions when data is partitioned for event-based sampling?

A
  • 60% Training
  • 30% Validation
  • 10% Test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where would you adjust the threshold for rejecting inputs with missing values?

A

Adjust the maximum percent missing under the Advisor Options group in the Advanced Project Settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the validation data that the Variable Selection node creates from the training data used for?

A

Model assessment during the modeling process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Data Exploration node enables you to do what?

A

View the most Important inputs or Screening to see suspicious variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the best practice for handling high-cardinality input variables?

A

binning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which Model Studio setting determines whether a numeric input is designated as interval or nominal?

A

If a numeric input has more distinct values/levels than the interval cutoff value, it is declared interval. Otherwise, it is declared nominal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Where would you specify the interval cut-off in Model Studio?

A

In the Advisor Options group under Advanced Project settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you define variable metadata and assign rules to modify variables?

A

You can perform these tasks using either the Data tab or the Manage Variables node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the temporary table produced in a Save Data node following a Decision Tree contain?

A

Following a decision tree, the table contains predicted probabilities and leaf IDs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can you do using the Manage Variables node after a pipeline is run?

A

Set up imputation and transformation rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the Manage Variables node enable you to do in Model Studio?

A

modify the data such as changing the role of a variable or adding transformations to the data within a pipeline, turn on event-based sampling before a pipeline is run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which variable selection technique identifies the set of input variables that jointly explain the maximum amount of variance contained in the data when using the Variable Selection node in Model Studio?

A

Unsupervised Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the Text Mining node do?

A

Creates topics based on groups of terms that occur together in several documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of the Feature Extraction node in Model Studio?

A

the Feature Extraction node transforms the existing features (variables) into a lower-dimensional space by generating new features that are composites of the original features

18
Q

What is the drawback to Feature Extraction?

A

Composite variables are no longer meaningful with respect to the original problem

19
Q

What is another term for a feature in predictive modeling?

A

Input

20
Q

How would you specify the threshold for rejecting categorical variables in Model Studio?

A

Set the maximum class levels under the Advisor Options group in Advanced Project settings

21
Q

Where can you set up imputation and transformation rules in Model Studio?

A

In the Manage Variables node after a pipeline is run

22
Q

For a new project that you are creating in SAS Model Studio, you wish to use a SAS data set loaded into memory. In which tab would this data set be located?

A

CAS tables loaded into memory are seen in Available.

23
Q

Does the Variable Selection node use supervised methods or unsupervised methods to select inputs?

A

The Variable Selection node can perform input selection based on both supervised and unsupervised methods.

24
Q

What is the curse of dimensionality?

A

The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target

25
Q

Why it is important to reduce the number of inputs during data preparation?

A

The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target

26
Q

What does the Save Data node do?

A

The Save Data node produces a temporary table in a CAS library.

27
Q

What does the Replacement node enable you to do?

A

replace outliers and unknown class values with specified values

28
Q

How do the transformations available in the Transformations node minimize bias in model predictions?

A

by reducing the effect of extreme or unusual input values

29
Q

What are some of the techniques used in the Feature Extraction node in Model Studio?

A

principal component analysis (PCA), robust PCA, singular value decomposition (SVD), and autoencoders

30
Q

What is binning?

A

Binning is a method of transformation that converts numeric inputs to categories or groups the levels of a high-cardinality input.

31
Q

Which transformation creates bins for a numeric variable?

A

A quantile transformation creates bins for a numeric variable.

32
Q

What would you use the Anomaly Detection node in Model Studio for?

A

the Anomaly Detection node is used to identify and exclude anomalies using the support vector data description (SVDD)

33
Q

When would it be helpful to use the Anonamly Detection node?

A

when using a data set where most of the data belongs to one class and the other class is scarce or missing

34
Q

What does the Filtering node do?

A

The Filtering node excludes certain observations such as rare or extreme values

35
Q

Assume the Target has an event proportion of 2% in the original data. You want to build models where event-based sampling has been used such that the modeling data set will have a 50% event proportion. What are the two ways this can be done using Model Studio?

A
  1. While the project is being created, after the data source has been selected, click the Advanced button. Select the Event-Based Sampling option. Turn on event-based sampling by checking the check box.
  2. After the project is created but before a pipeline has been run, go into project settings, select the Event-Based Sampling option. Turn on event-based sampling by checking the check box.
36
Q

What are some reasons for performing transformations on your data?

A

stabilizing variances,

removing non-linearity,

and correcting non-normality

37
Q

What is the default maximum cardinality for determining whether or not to reject a nominal variable?

A

20

38
Q

Is the Maximum percent missing option is turned on by default?

A

Yes

39
Q

What is Singular Value Decomposition (SVD)?

A

Singular value decomposition(SVD) projects the high-dimensional document and term spaces into a lower dimension space.

40
Q

What is SVD used for when selecting model inputs?

A

The singular values can be thought of as providing a measure of importance used to decide how many dimensions to keep.

41
Q

What does the Transformation node enable you to do?

A

alter your data by replacing an input variable with some function of that variable