Modules Flashcards

1
Q

Permutation Feature Importance

A

Feature Selction

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.

The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.

This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Filter Based Feature Selection

A

Feature Selection

The Filter Based Feature Selection module provides multiple feature selection algorithms to choose from, including correlation methods such as Pearsons’s or Kendall’s correlation, mutual information scores, and chi-squared values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fisher Linear Discriminant Analysis

A

Feature Selection

Identifies the linear combination of feature variables that can best group data into separate classes.

Captures the combination of features that best separates two or more classes.

This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Synthetic Minority Oversampling Technique (SMOTE)

A

Manipulation

Use the SMOTE module in Azure Machine Learning Studio to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Vowpal Wabbit

A

Text Analytics

Vowpal Wabbit (VW) is a fast, parallel machine learning framework that was developed for distributed computing by Yahoo! Research. Later it was ported to Windows and adapted by John Langford (Microsoft Research) for scientific computing in parallel architectures.

Features of Vowpal Wabbit that are important for machine learning include continuous learning (online learning), dimensionality reduction, and interactive learning. Vowpal Wabbit is also a solution for problems when you cannot fit the model data into memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Root Mean Square Error

A

Evaluate Model - Regression

Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R-Squared

A

Evaluate Model - Regression

Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

F1 score

A

Evaluate Model - Classification

F-score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

k-fold cross-validation

A

Cross Validate Module - Regression / Classification

Cross-validation is a technique often used in machine learning to assess both the variability of a dataset and the reliability of any model trained through that data.

The Cross Validate Model module takes as input a labeled dataset, together with an untrained classification or regression model. It divides the dataset into some number of subsets (folds), builds a model on each fold, and then returns a set of accuracy statistics for each fold. By comparing the accuracy statistics for all the folds, you can interpret the quality of the data set. You can then understand whether the model is susceptible to variations in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assign Data to Clusters

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Load Trained Model

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

C. Partition and Sample

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

D. Tune Model-Hyperparameters

A
Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a
"best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified parameters. Use Tune Model Hyperparameters to identify the optimal parameters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Build Counting Transform

A

Build Counting Transform module in Azure Machine Learning Studio, to analyze training data. From this data, the module builds a count table as well as a set of count-based features that can be used in a predictive model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Missing Values Scrubber

A

The Missing Values Scrubber module is deprecated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Feature Hashing

A

Feature hashing is used for linguistics, and works by converting unique tokens into integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Clean Missing Data

A

to remove, replace, or infer missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Replace Discrete Values

A

the Replace Discrete Values module in Azure Machine Learning Studio is used to generate a probability score that can be used to represent a discrete value. This score can be useful for understanding the information value of the discrete values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Import Data

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Latetent Dirichlet Transformation

A

Latent Dirichlet Allocation module in Azure Machine Learning Studio, to group otherwise unclassified text into a number of categories. Latent Dirichlet Allocation (LDA) is often used in natural language processing (NLP) to find texts that are similar. Another common term is topic modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Partition and Sample

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Convert to Indicator Values

A

Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Clean Missing Data

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Remove Duplicate Rows

A

xxx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Synthetic Minority Oversampling Technique (SMOTE)

A

xxx

26
Q

Stratified split

A

xxx

27
Q

Computer Linear Correlation

A

The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of Pearson correlation coefficients for each possible pair of variables in the input dataset.

28
Q

B. Export Count Table

A
The Export Count Table module is provided for backward compatibility with experiments that use the Build Count Table (deprecated) and Count Featurizer
(deprecated) modules.
29
Q

C. Execute Python Script

A

With Python, you can perform tasks that aren’t currently supported by existing Studio modules such as:
Visualizing data using matplotlib
Using Python libraries to enumerate datasets and models in your workspace
Reading, loading, and manipulating data from sources not supported by the Import Data module

30
Q

D. Convert to Indicator Values

A

The purpose of the Convert to Indicator Values module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

31
Q

E. Summarize Data

A

Summarize Data statistics are useful when you want to understand the characteristics of the complete dataset. For example, you might need to know:
How many missing values are there in each column?
How many unique values are there in a feature column?
What is the mean and standard deviation for each column?
The module calculates the important scores for each column, and returns a row of summary statistics for each variable (data column) provided as input.

32
Q

Test Hypothesis Using t-Test

A

xxx

33
Q

Remove stop words

A

Remove words to optimize information retrieval.
Remove stop words: Select this option if you want to apply a predefined stopword list to the text column. Stop word removal is performed before any other processes.

34
Q

Lemmatization

A

Ensure that multiple related words from a single canonical form.
Lemmatization converts multiple related words to a single canonical form

35
Q

Remove special characters

A

Remove special characters: Use this option to replace any non-alphanumeric special characters with the pipe | character.

36
Q

Group data into bins

A

xxx

37
Q

Group data into bins

A

xxx

38
Q

Synthetic Minority Oversampling Technique (SMOTE)

A

xxx

39
Q

Scale and Reduce

A

xxx

40
Q

Boosted Decision Tree Regression

A

xxx

41
Q

Online Gradient Descent

A

xxx

42
Q

Baysian Linear Regression

A

xxx

43
Q

Neural Network Regression

A

xxx

44
Q

Linear Regression

A

xxx

45
Q

Decision Forest Regression

A

xxx

46
Q

Clean Missing Data

A

xxx

47
Q

Multiple Imputation by Chained Equations (MICE)

A

xxx

48
Q

Equal Width with Custom Start and Stop binning

A

xxx

49
Q

Entropy MDL binning mode

A

xxx

50
Q

Apply a Quantiles binning mode with a PQuantile normalization

A

xxx

51
Q

Entropy MDL binning mode

A

xxx

52
Q

Synthetic Minority Oversampling Technique (SMOTE)

A

xxxx

53
Q

Last Observation Carried Forward (LOCF)

A

xxx

54
Q

Multiple Imputation by Chained Equations (MICE)

A

xxx

55
Q

Permutation Feature Importance

A

xxx

56
Q

Edit Metadata

A

xxx

57
Q

Filter Based Feature Selection

A

xxx

58
Q

Execute Python Script

A

xxx

59
Q

Latent Dirichlet Allocation

A

xxx

60
Q

Fortsätt Page 29

A

https://www.google.com/search?q=site:https://www.examtopics.com/exams/microsoft/dp-100/+%22studio-module-reference%E2%80%9D&rlz=1C1GCEA_enSE827SE827&sxsrf=ALeKk02QGVPNz-bJYoGOfw2svk4SdpRtPw:1591636394795&ei=qnHeXpGKMNL6qwGZoZGAAQ&start=10&sa=N&ved=2ahUKEwiRh7HP2_LpAhVS_SoKHZlQBBAQ8NMDegQIDBAz&biw=845&bih=927