Modules Flashcards by H O

Permutation Feature Importance

Feature Selction

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.

The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.

This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

How well did you know this?

Not at all

Perfectly

Filter Based Feature Selection

Feature Selection

The Filter Based Feature Selection module provides multiple feature selection algorithms to choose from, including correlation methods such as Pearsons’s or Kendall’s correlation, mutual information scores, and chi-squared values

How well did you know this?

Not at all

Perfectly

Fisher Linear Discriminant Analysis

Feature Selection

Identifies the linear combination of feature variables that can best group data into separate classes.

Captures the combination of features that best separates two or more classes.

This method is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.

How well did you know this?

Not at all

Perfectly

Synthetic Minority Oversampling Technique (SMOTE)

Manipulation

Use the SMOTE module in Azure Machine Learning Studio to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

How well did you know this?

Not at all

Perfectly

Vowpal Wabbit

Text Analytics

Vowpal Wabbit (VW) is a fast, parallel machine learning framework that was developed for distributed computing by Yahoo! Research. Later it was ported to Windows and adapted by John Langford (Microsoft Research) for scientific computing in parallel architectures.

Features of Vowpal Wabbit that are important for machine learning include continuous learning (online learning), dimensionality reduction, and interactive learning. Vowpal Wabbit is also a solution for problems when you cannot fit the model data into memory.

How well did you know this?

Not at all

Perfectly

Root Mean Square Error

Evaluate Model - Regression

Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction.

How well did you know this?

Not at all

Perfectly

R-Squared

Evaluate Model - Regression

Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect.

How well did you know this?

Not at all

Perfectly

F1 score

Evaluate Model - Classification

F-score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1

How well did you know this?

Not at all

Perfectly

k-fold cross-validation

Cross Validate Module - Regression / Classification

Cross-validation is a technique often used in machine learning to assess both the variability of a dataset and the reliability of any model trained through that data.

The Cross Validate Model module takes as input a labeled dataset, together with an untrained classification or regression model. It divides the dataset into some number of subsets (folds), builds a model on each fold, and then returns a set of accuracy statistics for each fold. By comparing the accuracy statistics for all the folds, you can interpret the quality of the data set. You can then understand whether the model is susceptible to variations in the data.

How well did you know this?

Not at all

Perfectly

Assign Data to Clusters

xxx

How well did you know this?

Not at all

Perfectly

Load Trained Model

xxx

How well did you know this?

Not at all

Perfectly

C. Partition and Sample

xxx

How well did you know this?

Not at all

Perfectly

D. Tune Model-Hyperparameters

Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a
"best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified parameters. Use Tune Model Hyperparameters to identify the optimal parameters.

How well did you know this?

Not at all

Perfectly

Build Counting Transform

Build Counting Transform module in Azure Machine Learning Studio, to analyze training data. From this data, the module builds a count table as well as a set of count-based features that can be used in a predictive model.

How well did you know this?

Not at all

Perfectly

Missing Values Scrubber

The Missing Values Scrubber module is deprecated

How well did you know this?

Not at all

Perfectly

Feature Hashing

Feature hashing is used for linguistics, and works by converting unique tokens into integers

How well did you know this?

Not at all

Perfectly

Clean Missing Data

to remove, replace, or infer missing values

How well did you know this?

Not at all

Perfectly

Replace Discrete Values

the Replace Discrete Values module in Azure Machine Learning Studio is used to generate a probability score that can be used to represent a discrete value. This score can be useful for understanding the information value of the discrete values.

How well did you know this?

Not at all

Perfectly

Import Data

xxx

How well did you know this?

Not at all

Perfectly

Latetent Dirichlet Transformation

Latent Dirichlet Allocation module in Azure Machine Learning Studio, to group otherwise unclassified text into a number of categories. Latent Dirichlet Allocation (LDA) is often used in natural language processing (NLP) to find texts that are similar. Another common term is topic modeling.

How well did you know this?

Not at all

Perfectly

Partition and Sample

xxx

How well did you know this?

Not at all

Perfectly

Convert to Indicator Values

Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

How well did you know this?

Not at all

Perfectly

Clean Missing Data

xxx

How well did you know this?

Not at all

Perfectly

Remove Duplicate Rows

xxx

How well did you know this?

Not at all

Perfectly

Synthetic Minority Oversampling Technique (SMOTE)

xxx

Stratified split

xxx

Computer Linear Correlation

The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of Pearson correlation coefficients for each possible pair of variables in the input dataset.

B. Export Count Table

``` The Export Count Table module is provided for backward compatibility with experiments that use the Build Count Table (deprecated) and Count Featurizer (deprecated) modules. ```

C. Execute Python Script

With Python, you can perform tasks that aren't currently supported by existing Studio modules such as: Visualizing data using matplotlib Using Python libraries to enumerate datasets and models in your workspace Reading, loading, and manipulating data from sources not supported by the Import Data module

D. Convert to Indicator Values

The purpose of the Convert to Indicator Values module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

E. Summarize Data

Summarize Data statistics are useful when you want to understand the characteristics of the complete dataset. For example, you might need to know: How many missing values are there in each column? How many unique values are there in a feature column? What is the mean and standard deviation for each column? The module calculates the important scores for each column, and returns a row of summary statistics for each variable (data column) provided as input.

Test Hypothesis Using t-Test

xxx

Remove stop words

Remove words to optimize information retrieval. Remove stop words: Select this option if you want to apply a predefined stopword list to the text column. Stop word removal is performed before any other processes.

Lemmatization

Ensure that multiple related words from a single canonical form. Lemmatization converts multiple related words to a single canonical form

Remove special characters

Remove special characters: Use this option to replace any non-alphanumeric special characters with the pipe | character.

Group data into bins

xxx

Group data into bins

xxx

Synthetic Minority Oversampling Technique (SMOTE)

xxx

Scale and Reduce

xxx

Boosted Decision Tree Regression

xxx

Online Gradient Descent

xxx

Baysian Linear Regression

xxx

Neural Network Regression

xxx

Linear Regression

xxx

Decision Forest Regression

xxx

Clean Missing Data

xxx

Multiple Imputation by Chained Equations (MICE)

xxx

Equal Width with Custom Start and Stop binning

xxx

Entropy MDL binning mode

xxx

Apply a Quantiles binning mode with a PQuantile normalization

xxx

Entropy MDL binning mode

xxx

Synthetic Minority Oversampling Technique (SMOTE)

xxxx

Last Observation Carried Forward (LOCF)

xxx

Multiple Imputation by Chained Equations (MICE)

xxx

Permutation Feature Importance

xxx

Edit Metadata

xxx

Filter Based Feature Selection

xxx

Execute Python Script

xxx

Latent Dirichlet Allocation

xxx

Fortsätt Page 29

https://www.google.com/search?q=site:https://www.examtopics.com/exams/microsoft/dp-100/+%22studio-module-reference%E2%80%9D&rlz=1C1GCEA_enSE827SE827&sxsrf=ALeKk02QGVPNz-bJYoGOfw2svk4SdpRtPw:1591636394795&ei=qnHeXpGKMNL6qwGZoZGAAQ&start=10&sa=N&ved=2ahUKEwiRh7HP2_LpAhVS_SoKHZlQBBAQ8NMDegQIDBAz&biw=845&bih=927

Modules Flashcards

(60 cards)