Explore data and train models (35–40%) Flashcards

Question

ML studio: Author tab options?

Answer 1

Notebooks, Automated ML, Designer

Answer 2

Data Jobs, Components, Pipelines, Environments, Models, Endpoints

Answer 3

Compute, Linked Services, Data Labeling

Answer 4

subscription_id, resource_group, workspace_name

Answer 5

need to call MLClient whenever you connect to the workspace, like creating or updating assets or resources

Answer 6

Basically, manage anything. It is also good for automating tasks. see: https://learn.microsoft.com/en-us/cli/azure/ml?view=azure-cli-latest

Answer 7

code: path to model, command: training script, environment: environment for script, compute: compute for script, display_name, experiment_name

Answer 8

mlClient().create_or_update(job)

Answer 9

Uniform Resource Identifier

Answer 10

http(s): public/private azure blob storage or public web location, abf(s): azure data lake G2, azureml: datastore in azure ml

Answer 11

Credential-based: use service principal, shared access signature or account key, Identity-based: use Azure Active Directory identity or managed identity

Answer 12

name, description, account_name, container_name, credentials

Answer 13

Share and reuse data with other members, seamlessly access data during model training, version the metadata of the data asset

Answer 14

URI file: points to a specific file, URI folder: point to a folder, MLTable: point to a folder or file and includes a schema to read as tabular data

Answer 15

local: ./, azure blob storage: wasbs://.blob.core.windows.net///, Azure Data Lake Storage G2: abfss://@.dfs.core.windows.net//, Datastore: azureml://datastores//paths//

Answer 16

path to the item, type of item: uses AssetTypes class, description, name, version

Answer 17

call argparse.ArguementParser(), add an argument and specify its type, then do argparse.ArgumentParser().parse_args(). then treat it like a normal item using argparse.ArgumentParser().parse_args().input_data using the appropriate function to interact with the asset

Answer 18

Data(path,type=AssetTypes.URI_FOLDER,description,name,version), then ml_client.data.create_or_update(Data(

))

Answer 19

data_path = argparse.ArgumentParser().add_argument('--input_data', type=str).parse_args().input data allfiles = glob.glob(data_path+"/*.csv") This would make a list of files

Answer 20

You need a schema definition so you don't have to redefine the schema every time you want to call it. It's a yml file. Then do Data(path, type=AssetTypes.MLTABLE, description,name,version)

Answer 21

args=argparse.ArgumentParser().add_argument("--input_data",type=str).parse_args() tbl=mltable.load(args.input_data) df = tbl.to_pandas_dataframe()

Answer 22

stores = ml_client.datastores.list() for ds_name in stores: print(ds_name.name)

Answer 23

from azure.ai.ml.entities import store = (name, description, account_name, container_name, credentials) ml_client.create_or_update(store)

Answer 24

AssetTypes.URI_FILE, AssetTypes.URI_FOLDER, AssetTypes.ML_TABLE

Answer 25

ml_client.data.create_or_update()

Answer 26

A unique name and a size

Answer 27

Schedule it to shut off at the end of the day

Answer 28

Amlcompute(name,type,size,location,min_instances,max_instances,idle_time_before_scale_down,tier)

Answer 29

It specifies whether you have priority on the compute cluster. Low priority may be cheap, but you also may not get your cluster. It's like need vs want when rolling for loot in an mmo.

Answer 30

Running a pipeline job from designer, running an automated ML job, running a script as a job.

Answer 31

the minimum number of nodes, the maximum number of nodes, and the idle time before scaling down

Answer 32

for env in ml_client.environments.list(): print(env.name)

Answer 33

env = ml_client.environments.get("

Answer 34

Add a conda specification file which will add more dependencies that you need

Answer 35

actions on data that put all columns on the same scale so no one column has unequal influence on the model training

Answer 36

automl.(compute,experiment_name,training_data,target_column_name,primary_metric,n_cross_validations,enable_model_explainability)

Answer 37

Automl needs an MLTable as input

Answer 38

from azure.ai.ml.automl import ClassificationPrimaryMetrics list(classificationPrimaryMetrics)

Answer 39

timeout_minutes: max time for complete test, trial_timeout_minutes: Max time for one trial, max_trials: Max number of models to be trained, enable_early_termination: whether to end if score isn't improving in short term, max_concurrent_trials: limits use of cores in cluster computes

Answer 40

Class balancing detection, missing feature values imputation, high cardinality feature detection

Answer 41

You need the scripts to prepare the data and train the model, you need the yml files for the scripts so they can run, then you build the pipeline and run it.

Answer 42

Hyperparameters that have a finite set of values

Answer 43

Hyperparameters that can use any values along a scale, resulting in an infinite number of possibilities

Answer 44

You can use a python list, a range, or an arbitrary list of comma-separated values

Answer 45

QUniform, QLogUniform, QNormal, QLogNormal

Answer 46

Uniform, LogUniform, Normal, LogNormal

Answer 47

Grid, Random, and Bayesian

Answer 48

Tries every possible combination

Answer 49

Randomly chooses values from the search space

Answer 50

Chooses new values based on previous results

Answer 51

Random but with a seed so you can reproduce results

Answer 52

You can only use choice, uniform, and quniform parameter expressions, and you can't use an early-termination policy.

Answer 53

So you don't waste compute and time endlessly hyperparameterizing

Answer 54

evaluation_interval and delay_evaluation

Answer 55

basically how often you want to perform a termination check

Answer 56

Basically delay termination checks for a minimum number of intervals

Answer 57

Bandit policy, Median stopping policy, and truncation selection policy

Answer 58

you specify a slack amount, and the hyperparameterizing will stop if the performance is lower than the difference between the best performing run and the slack amount. You can also specify a slack factor, which will be a ratio instead of a flat number.

Answer 59

It will stop the sweep when a new trial scores lower than the current median score

Answer 60

You set a truncation percentage. If, at the time of the check, the trial is within the truncation percentage, the sweep will stop. EX: If the percentage is 20%, the sweep will stop if the performance is within the worst 20% of the models thus far

Explore data and train models (35–40%) Flashcards

(84 cards)