ML Studio Flashcards
Import data
Dataset usually correct answer if asked
The designer supports tabular datasets created from
the following sources:
Delimited files
JSON files
Parquet files
SQL queries
Clean missing data
This module lets you define a cleaning operation. You can also save the cleaning operation so that you can apply it later to new data.
Partition and sample
Dividing your data into multiple subsections of the same size.
Separating data into groups and then working with data from a specific group.
Sampling.
Creating a smaller dataset for testing.
Put in correct order: Train model, Linear regression, Evaluate model, Score model
Linear regression, Train model, Score model, Evaluate model
Compute Instances
Development workstations that data scientists can use to work with data and models.
Compute Clusters
Scalable clusters of virtual machines for on-demand processing of experiment code.
Inference Clusters
Deployment targets for predictive services that use your trained models.
Attached Compute
Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.
Task types in AutoML
Classification, Regression and Time series forecasting
Featurization = ‘Enabled’ in AutoML
This causes Azure Machine Learning to automatically preprocess the features before training
Evaluating k-means clustering
Average Distance to Other Center
Average Distance to Cluster Center
Number of Points in the cluster
Maximal Distance to Cluster Center: The maximum of the distances between each point and the centroid of that point’s cluster. If this number is high, the cluster may be widely dispersed. This statistic in combination with the Average Distance to Cluster Center helps you determine the cluster’s spread.
Azure Container Instance
This type of compute is created dynamically, and is useful for development and testing.
Create trainer mode option = “Parameter Range”
Set the Create trainer mode option to Parameter Range and use the Range Builder to specify a range of values to use in the parameter sweep.
Almost all the classification and regression modules support an integrated parameter sweep. For those learners that do not support configuring a parameter range, only the available parameter values can be tested.