ML Studio Flashcards

1
Q

Import data

A

Dataset usually correct answer if asked

The designer supports tabular datasets created from
the following sources:

Delimited files
JSON files
Parquet files
SQL queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clean missing data

A

This module lets you define a cleaning operation. You can also save the cleaning operation so that you can apply it later to new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Partition and sample

A

Dividing your data into multiple subsections of the same size.

Separating data into groups and then working with data from a specific group.

Sampling.

Creating a smaller dataset for testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Put in correct order: Train model, Linear regression, Evaluate model, Score model

A

Linear regression, Train model, Score model, Evaluate model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Compute Instances

A

Development workstations that data scientists can use to work with data and models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compute Clusters

A

Scalable clusters of virtual machines for on-demand processing of experiment code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inference Clusters

A

Deployment targets for predictive services that use your trained models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Attached Compute

A

Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Task types in AutoML

A

Classification, Regression and Time series forecasting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Featurization = ‘Enabled’ in AutoML

A

This causes Azure Machine Learning to automatically preprocess the features before training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Evaluating k-means clustering

A

Average Distance to Other Center

Average Distance to Cluster Center

Number of Points in the cluster

Maximal Distance to Cluster Center: The maximum of the distances between each point and the centroid of that point’s cluster. If this number is high, the cluster may be widely dispersed. This statistic in combination with the Average Distance to Cluster Center helps you determine the cluster’s spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Azure Container Instance

A

This type of compute is created dynamically, and is useful for development and testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Create trainer mode option = “Parameter Range”

A

Set the Create trainer mode option to Parameter Range and use the Range Builder to specify a range of values to use in the parameter sweep.

Almost all the classification and regression modules support an integrated parameter sweep. For those learners that do not support configuring a parameter range, only the available parameter values can be tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly