Questions set 1 Flashcards

Question

[PERSONAL] Pearson correlation

Answer 1

is a statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1. A value of +1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. Correlation is a technique for investigating the relationship between two quantitative, continuous variables

Answer 2

Explanation Instead of using Principal Components Analysis, use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases. Incorrect Answers: The Principal Component Analysis module in Azure Machine Learning Studio (classic) is used to reduce the dimensionality of your training data. The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features.

Answer 3

Explanation Instead of using Principal Components Analysis, use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases. Incorrect Answers: The Principal Component Analysis module in Azure Machine Learning Studio (classic) is used to reduce the dimensionality of your training data. The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features.

Answer 4

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Answer 5

Explanation Entropy MDL binning mode: This method requires that you select the column you want to predict and the column or columns that you want to group into bins. It then makes a pass over the data and attempts to determine the number of bins that minimizes the entropy. In other words, it chooses a number of bins that allows the data column to best predict the target column. It then returns the bin number associated with each row of your data in a column named quantized.

Answer 6

Too obvious :) - heigth - weight - PrevExamMarks

Answer 7

False negative A false negative is an outcome where the model incorrectly predicts the negative class. Note: Let's make the following definitions: "Wolf" is a positive class. "No wolf" is a negative class. We can summarize our "wolf-prediction" model using a 2x2 confusion matrix that depicts all four possible outcomes:

Answer 8

inputs =[training_ds.as_named.input('training_ds')] Estimator. Represents a generic estimator to train data using any supplied framework. This class is designed for use with machine learning frameworks that do not already have an Azure Machine Learning pre-configured estimator. Pre-configured estimators exist for Chainer, PyTorch, TensorFlow, and SKLearn - inputs (list): A list of DataReference or DatasetConsumptionConfig objects to use as input.

Answer 9

Estimator. Represents a generic estimator to train data using any supplied framework. This class is designed for use with machine learning frameworks that do not already have an Azure Machine Learning pre-configured estimator. Pre-configured estimators exist for Chainer, PyTorch, TensorFlow, and SKLearn The Estimator class wraps run configuration information to help simplify the tasks of specifying how a script is executed. It supports single-node as well as multi-node execution. Running the estimator produces a model in the output directory specified in your training script.

Answer 10

Parameters for estimator source_directory (str) A local directory containing experiment configuration and code files needed for a training job. compute_target (AbstractComputeTarget or str) The compute target where training will happen. This can either be an object or the string "local". vm_size (str) The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size. vm_priority (str) The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used. Supported values: 'dedicated' and 'lowpriority'.This takes effect only when the vm_size parameter is specified in the input. entry_script (str) The relative path to the file used to start training. script_params (dict) A dictionary of command-line arguments to pass to the training script specified in entry_script. node_count (int) The number of nodes in the compute target used for training. If greater than 1, an MPI distributed job will be run. process_count_per_node (int) The number of processes (or "workers") to run on each node. If greater than 1, an MPI distributed job will be run. Only the AmlCompute target is supported for distributed jobs. distributed_backend (str) The communication backend for distributed training. DEPRECATED. Use the distributed_training parameter. Supported values: 'mpi'. 'mpi' represents MPI/Horovod. This parameter is required when node_count or process_count_per_node > 1. When node_count == 1 and process_count_per_node == 1, no backend will be used unless the backend is explicitly set. Only the AmlCompute target is supported for distributed training. distributed_training (Mpi) Parameters for running a distributed training job. For running a distributed job with MPI backend, use Mpi object to specify process_count_per_node. use_gpu (bool) Indicates whether the environment to run the experiment should support GPUs. If true, a GPU-based default Docker image will be used in the environment. If false, a CPU-based image will be used. Default Docker images (CPU or GPU) will be used only if the custom_docker_image parameter is not set. This setting is used only in Docker enabled compute targets. use_docker (bool) Specifies whether the environment to run the experiment should be Docker-based. custom_docker_base_image (str) The name of the Docker image from which the image to use for training will be built. DEPRECATED. Use the custom_docker_image parameter. If not set, a default CPU-based image will be used as the base image. custom_docker_image (str) The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image. Only specify images available in public docker repositories (Docker Hub). To use an image from a private docker repository, use the constructor's environment_definition parameter instead. image_registry_details (ContainerRegistry) The details of the Docker image registry. user_managed (bool) Specifies whether Azure ML reuses an existing Python environment. If false, a Python environment is created based on the conda dependencies specification. conda_packages (list) A list of strings representing conda packages to be added to the Python environment for the experiment. pip_packages (list) A list of strings representing pip packages to be added to the Python environment for the experiment. conda_dependencies_file_path (str) The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages DEPRECATED. Use the conda_dependencies_file paramenter. Specify either conda_dependencies_file_path or conda_dependencies_file. If both are specified, conda_dependencies_file is used. pip_requirements_file_path (str) The relative path to the pip requirements text file. DEPRECATED. Use the pip_requirements_file parameter. This parameter can be specified in combination with the pip_packages parameter. Specify either pip_requirements_file_path or pip_requirements_file. If both are specified, pip_requirements_file is used. conda_dependencies_file (str) The relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. pip_requirements_file (str) The relative path to the pip requirements text file. This parameter can be specified in combination with the pip_packages parameter. environment_variables (dict) A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed. environment_definition (Environment) The environment definition for the experiment. It includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using this parameter. If this parameter is specified, it will take precedence over other environment-related parameters like use_gpu, custom_docker_image, conda_packages, or pip_packages. Errors will be reported on invalid combinations. Inputs (list) A list of DataReference or DatasetConsumptionConfig objects to use as input. source_directory_data_store (Datastore) The backing data store for the project share. shm_size (str) The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used. For more information, see Docker run reference. resume_from (DataPath) The data path containing the checkpoint or model files from which to resume the experiment. max_run_duration_seconds (int) The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it take longer than this value.

Answer 11

``` # Get the training dataset diabetes_ds = ws.datasets.get("Diabetes Dataset") ``` ``` # Create an estimator that uses the remote compute hyper_estimator = SKLearn(source_directory=experiment_folder, inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input compute_target = cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-sdk','argparse','pyarrow'], entry_script='diabetes_training.py') ``` source (this is a good source for general setup): https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing%20Model%20Training.ipynb

Answer 12

Instead of using Clean Missing Data, use Replace using Probabilistic PCA Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns.

Answer 13

YT: https://www.youtube.com/watch?v=6z6yipdfe3o Replaces the missing values by using a linear model that analyzes the correlations between the columns and estimates a low-dimensional approximation of the data, from which the full data is reconstructed. The underlying dimensionality reduction is a probabilistic form of Principal Component Analysis (PCA), and it implements a variant of the model proposed in the Journal of the Royal Statistical Society, Series B 21(3), 611–622 by Tipping and Bishop. Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for each column. Instead, it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns. The key limitations of this method are that it expands categorical columns into numerical indicators and computes a dense covariance matrix of the resulting data. It also is not optimized for sparse representations. For these reasons, datasets with large numbers of columns and/or large categorical domains (tens of thousands) are not supported due to prohibitive space consumption.

Answer 14

not requiring the application of predictors for each column. it approximates the covariance for the full dataset. Therefore, it might offer better performance for datasets that have missing values in many columns - computes dens covariance matrix - not optimized for spare representations Not good for datasets with large numbers of columns ans/or large categorical domaines.

Answer 15

Explanation Incorrect Answers: 1) A violin plot is a visual that traditionally combines a box plot and a kernel density plot. 2) Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. 3) A box plot lets you see basic distribution information about your data, such as median, mean, range and quartiles but doesn't show you how your data looks throughout its range.

Answer 16

Clean missing data Each time that you apply the Clean Missing Data module to a set of data, the same cleaning operation is applied to all columns that you select. Therefore, if you need to clean different columns using different methods, use separate instances of the module. Add the Clean Missing Data module to your pipeline, and connect the dataset that has missing values. For Columns to be cleaned, choose the columns that contain the missing values you want to change. You can choose multiple columns, but you must use the same replacement method in all selected columns. Therefore, typically you need to clean string columns and numeric columns separately. For example, to check for missing values in all numeric columns: Select the Clean Missing Data module, and click on Edit column in the right panel of the module. For Include, select Column types from the dropdown list, and then select Numeric. Any cleaning or replacement method that you choose must be applicable to all columns in the selection. If the data in any column is incompatible with the specified operation, the module returns an error and stops the pipeline.

Answer 17

Explanation The box-plot algorithm can be used to display outliers. One other way to quickly identify Outliers and represent visually is to create scatter plots.

Answer 18

The ROC curve shows the trade-off between sensitivity (or TPR) and specificity (1 – FPR). Classifiers that give curves closer to the top-left corner indicate a better performance. As a baseline, a random classifier is expected to give points lying along the diagonal (FPR = TPR). The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.

Answer 19

Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which leads to a suboptimal subset of features. In mathematical statistics, the Fisher information (sometimes simply called information[1]) is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Extra information: https://towardsdatascience.com/overview-of-feature-selection-methods-a2d115c7a8f7

Answer 20

Mutual Information The mutual information score is particularly useful in feature selection because it maximizes the mutual information between the joint distribution and target variables in datasets with many dimensions.

Answer 21

- azure notebooks

Answer 22

- remove stop words - lemmatization - remove special characters

Answer 23

Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

Answer 24

In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

Answer 25

log log_image log_table Explanation Box 1: log The number of observations in the dataset. run.log(name, value, description=") Scalar values: Log a numerical or string value to the run with the given name. Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the same metric multiple times within a run, the result being considered a vector of that metric. Example: run.log("accuracy", 0.95) Box 2: log_image A box plot of income by home_owner. log_image Log an image to the run record. Use log_image to log a .PNG image file or a matplotlib plot to the run. These images will be visible and comparable in the run record. Example: run.log_image("ROC", plot=plt) Box 3: log_table A dictionary containing the city names and the average income for each city. log_table: Log a dictionary object to the run with the given name.

Answer 26

Import pandas as dataframe Use .map method of the dataframe Explanation Box 1: pandas as df Pandas takes data like a CSV or TSV file, or a SQL database and creates a Python object with rows and columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for example). Box 2: map[ProductCategoryMapping]

Answer 27

Step 1: Add a Two-Class Support Vector Machine module to initialize the SVM classifier. Step 2: Add a dataset to the experiment Step 3: Add a Split Data module to create training and test dataset. Step 4: Add a Permutation Feature Importance module and connect to the trained model and test dataset. Step 5: Set the Metric for measuring performance property to Classification - Accuracy and then run the experiment.

Answer 28

Split data

Answer 29

normal(10,3) batch_size = choice(16,32,64) keep_probability = uniform(0.05,0.1) Explanation Random sampling allows the search space to include both discrete and continuous hyperparameters. In random sampling, hyperparameter values are randomly selected from the defined search space.

Answer 30

``` Explanation The Export Count Table module is provided for backward compatibility with experiments that use the Build Count Table (deprecated) and Count Featurizer (deprecated) modules. ``` Summarize Data statistics are useful when you want to understand the characteristics of the complete dataset. For example, you might need to know: § How many missing values are there in each column? § How many unique values are there in a feature column? § What is the mean and standard deviation for each column? The module calculates the important scores for each column, and returns a row of summary statistics for each variable (data column) provided as input. Incorrect Answers: The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of Pearson correlation coefficients for each possible pair of variables in the input dataset. With Python, you can perform tasks that aren't currently supported by existing Studio modules such as: § Visualizing data using matplotlib § Using Python libraries to enumerate datasets and models in your workspace § Reading, loading, and manipulating data from sources not supported by the Import Data module The purpose of the Convert to Indicator Values module is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.

Answer 31

Label data must be positive whole numbers Poisson regression is intended for use in regression models that are used to predict numeric values, typically counts. Therefore, you should use this module to create your regression model only if the values you are trying to predict fit the following conditions: § The response variable has a Poisson distribution. § Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels. § A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-whole numbers. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-regression

Answer 32

is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.[1]

Answer 33

pd.melt(dataframe, id_vars = ''shop", value_vars = ['2017','2018']) Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns – variable and value. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html

Answer 34

Explanation Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model. Incorrect Answers: A violin plot is a visual that traditionally combines a box plot and a kernel density plot. Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. A scatter plot graphs the actual values in your data against the values predicted by the model. The scatter plot displays the actual values along the X-axis, and displays the predicted values along the Y-axis. It also displays a line that illustrates the perfect prediction, where the predicted value exactly matches the actual value.

Answer 35

https://www.statisticshowto.com/probability-and-statistics/skewed-distribution

Answer 36

Recurrent Neural networkds You need to build a recurrent neural network (RNN) to translate a corpus of English text to French. Note: RNNs are designed to take sequences of text as inputs or return sequences of text as outputs, or both. They're called recurrent because the network's hidden layers have a loop in which the output and cell state from each time step become inputs at the next time step. This recurrence serves as a form of memory. It allows contextual information to flow through the network so that relevant outputs from previous time steps can be applied to network operations at the current time step.

Answer 37

https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/ MLPs are suitable for classification prediction problems where inputs are assigned a class or label. They are also suitable for regression prediction problems where a real-valued quantity is predicted given a set of inputs. Data is often provided in a tabular format, such as you would see in a CSV file or a spreadsheet. Use MLPs For: Tabular datasets Classification prediction problems Regression prediction problems As such, if your data is in a form other than a tabular dataset, such as an image, document, or time series, I would recommend at least testing an MLP on your problem. The results can be used as a baseline point of comparison to confirm that other models that may appear better suited add value. Try MLPs On: Image data Text Data Time series data Other types of data

Answer 38

Convolutional Neural Networks, or CNNs, were designed to map image data to an output variable. They have proven so effective that they are the go-to method for any type of prediction problem involving image data as an input. The benefit of using CNNs is their ability to develop an internal representation of a two-dimensional image. This allows the model to learn position and scale in variant structures in the data, which is important when working with images. Use CNNs For: - Image data - Classification prediction problems - Regression prediction problems More generally, CNNs work well with data that has a spatial relationship. The CNN input is traditionally two-dimensional, a field or matrix, but can also be changed to be one-dimensional, allowing it to develop an internal representation of a one-dimensional sequence. This allows the CNN to be used more generally on other types of data that has a spatial relationship. For example, there is an order relationship between words in a document of text. There is an ordered relationship in the time steps of a time series. Although not specifically developed for non-image data, CNNs achieve state-of-the-art results on problems such as document classification used in sentiment analysis and related problems.

Answer 39

Recurrent Neural Networks, or RNNs, were designed to work with sequence prediction problems. Sequence prediction problems come in many forms and are best described by the types of inputs and outputs supported. Some examples of sequence prediction problems include: ``` One-to-Many: An observation as input mapped to a sequence with multiple steps as an output. Many-to-One: A sequence of multiple steps as input mapped to class or quantity prediction. Many-to-Many: A sequence of multiple steps as input mapped to a sequence with multiple steps as output ``` RNNs in general and LSTMs in particular have received the most success when working with sequences of words and paragraphs, generally called natural language processing. This includes both sequences of text and sequences of spoken language represented as a time series. They are also used as generative models that require a sequence output, not only with text, but on applications such as generating handwriting. Use RNNs For: ``` Text data Speech data Classification prediction problems Regression prediction problems Generative models ``` Don’t Use RNNs For: Tabular data Image data

Answer 40

They help to solve such tasks as image generation from descriptions, getting high resolution images from low resolution ones, predicting which drug could treat a certain disease, retrieving images that contain a given pattern, etc

Answer 41

The following metrics are reported for evaluating regression models. When you compare models, they are ranked by the metric you select for evaluation. Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better. Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-prediction. Relative absolute error (RAE) is the relative absolute difference between expected and actual values; relative because the mean difference is divided by the arithmetic mean. Relative squared error (RSE) similarly normalizes the total squared error of the predicted values by dividing by the total squared error of the actual values. Mean Zero One Error (MZOE) indicates whether the prediction was correct or not. In other words: ZeroOneLoss(x,y) = 1 when x!=y; otherwise 0. Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R2 values, as low values can be entirely normal and high values can be suspect. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Answer 42

Explanation Maximum number of runs on random grid: This option also controls the number of iterations over a random sampling of parameter values, but the values are not generated randomly from the specified range; instead, a matrix is created of all possible combinations of parameter values and a random sampling is taken over the matrix. This method is more efficient and less prone to regional oversampling or undersampling. For Random seed, type a number to use when initializing the parameter sweep. If you are training a model that supports an integrated parameter sweep, you can also set a range of seed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoiding bias introduced by seed selection. Incorrect Answers: If you are building a clustering model, use Sweep Clustering to automatically determine the optimum number of clusters and other parameters. Entire grid: When you select this option, the module loops over a grid predefined by the system, to try different combinations and identify the best learner. This option is useful for cases where you don't know what the best parameter settings might be and want to try all possible combination of values. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

Answer 43

Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed. Depending on how long you want the tuning process to run, you might decide to exhaustively test all combinations, or you could shorten the process by establishing a grid of parameter combinations and testing a randomized subset of the parameter grid.

Answer 44

With this option, you divide your data into some number of folds and then build and test models on each fold. This method provides the best accuracy and can help find problems with the dataset; however, it takes longer to train.

Answer 45

Random sweep: This option trains a model using a set number of iterations. You specify a range of values to iterate over, and the module uses a randomly chosen subset of those values. Values are chosen with replacement, meaning that numbers previously chosen at random are not removed from the pool of available numbers. Thus, the chance of any value being selected remains the same across all passes. Grid sweep: This option creates a matrix, or grid, that includes every combination of the parameters in the value range you specify. When you start tuning with this module, multiple models are trained using combinations of these parameters. Entire grid: The option to use the entire grid means just that: each and every combination is tested. This option can be considered the most thorough, but requires the most time. Random grid: If you select this option, the matrix of all combinations is calculated and values are sampled from the matrix, over the number of iterations you specified. Recent research has shown that random sweeps can perform better than grid sweeps.

Answer 46

The proportion of true results to total cases

Answer 47

The proportion of true results to positive results

Answer 48

The fraction of all correct results over all results

Answer 49

A measure that balances precision and recall

Answer 50

A value that represents the area under the curve when false positives are plotted on the x-axis and true positives are plotted on the y-axis

Answer 51

he difference between two probability distributions: the true one, and the one in the model.

Answer 52

Train Log Loss The improvement provided by the model over a random prediction

Answer 53

Set the create trainer mode option to parameter range https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression For Create trainer mode, indicate whether you want to train the model with a predefined set of parameters, or if you want to optimize the model by using a parameter sweep. Single Parameter: If you know how you want to configure the linear regression network, you can provide a specific set of values as arguments. Parameter Range: If you want the algorithm to find the best parameters for you, set Create trainer mode option to Parameter Range. You can then specify multiple values for the algorithm to try.

Answer 54

Set the create trainer mode option to parameter range https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression For Create trainer mode, indicate whether you want to train the model with a predefined set of parameters, or if you want to optimize the model by using a parameter sweep. Single Parameter: If you know how you want to configure the linear regression network, you can provide a specific set of values as arguments. Parameter Range: If you want the algorithm to find the best parameters for you, set Create trainer mode option to Parameter Range. You can then specify multiple values for the algorithm to try.

Answer 55

a value that indicates how many times the algorithm should iterate through examples. For datasets with a small number of examples, this number should be large to reach convergence. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression

Answer 56

Explanation An overfit model is one where performance on the train set is good and continues to improve, whereas performance on the validation set improves to a point and then begins to degrade. - the training loss decreases while the validation loss increases when training the model. https: //machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/

Answer 57

Explanation The provided options are metrics for evaluating classification models. Instead of those you can use Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.

Answer 58

Explanation Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data, such as the holdout test set. Keras provides a weight regularization API that allows you to add a penalty for weight size to the loss function. Three different regularizer instances are provided; they are: § L1: Sum of the absolute weights. § L2: Sum of the squared weights. § L1L2: Sum of the absolute and the squared weights. https: //machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/ https: //en.wikipedia.org/wiki/Convolutional_neural_network L2 regularization is the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors. Due to multiplicative interactions between weights and inputs this has the useful property of encouraging the network to use all of its inputs a little rather than some of its inputs a lot. L1 regularization is another common form. It is possible to combine L1 with L2 regularization (this is called Elastic net regularization). The L1 regularization leads the weight vectors to become sparse during optimization. In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the noisy inputs. Data augmentation is a technique to artificially create new training data from existing training data. This is done by applying domain-specific techniques to examples from the training data that create new and different training examples.

Answer 59

Data augmentation is a technique to artificially create new training data from existing training data. This is done by applying domain-specific techniques to examples from the training data that create new and different training examples. You can use Keras for this Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks

Answer 60

To add a new column you can run Execute Python Script. All new columns are labeled as "feature" by default. Change Metadata module cannot be used to add new feature or column. https://www.tensorflow.org/tutorials/structured_data/feature_columns

Answer 61

In natural language processing, the latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's presence is attributable to one of the document's topics. LDA is an example of a topic model and belongs to the machine learning toolbox and in wider sense to the artificial intelligence toolbox.

Answer 62

from numpy import array from sklearn.model_selection imprt k-fold ``` data = array([10,20,30,40,50,60]) kfold = Kfold(n_splits = 3, shuffle = true, random_state = 1) ``` for train, test in kFold, split ,(data): print('train': %s, test: %5' % (data[train], data[test])) Explanation K-Folds cross-validation provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.

Answer 63

Explanation Editing the metadata allows you to rename or change the data type of existing columns. Since a new column/feature is being added. The answer Apply SQL Transform

Answer 64

from sklearn.decomposition import PCA ``` pca = PCA(n_components = 10) X_train = pca.fit_transform(X_train) x_test = pca.fit ```

Answer 65

# Choose algorithm that minimizes Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower score is better. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Answer 66

- Centroids do not change between iterations - A fixed number of iterations is executed - The residual sum of squares (RSS) fall below a threshold. Explanation The algorithm terminates when the centroids stabilize or when a specified number of iterations are completed. A measure of how well the centroids represent the members of their clusters is the residual sum of squares or RSS, the squared distance of each vector from its centroid summed over all vectors. RSS is the objective function and our goal is to minimize it. https: //docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/k-means-clustering https: //nlp.stanford.edu/IR-book/html/htmledition/k-means-1.html

Answer 67

Explanation ds-workstation notebook VM: An authenticated connection must not be required for testing. On a Microsoft Azure virtual machine (VM), including a Data Science Virtual Machine (DSVM), you create local user accounts while provisioning the VM. Users then authenticate to the VM by using these credentials. https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-common-identity

Answer 68

Box 1: No If a training cluster already exists it will be used. Box 2: Yes The wait_for_completion method waits for the current provisioning operation to finish on the cluster. Box 3: Yes Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted. Box 4: No Need to use training_compute.delete() to deprovision and delete the AmlCompute target. https://notebooks.azure.com/azureml/projects/azureml-getting-started/html/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb

Answer 69

- standardscaler: assumes your data is normally distributed within each feature and will scale them such that the distribution is now centred around 0, with a standard deviation of 1. - min max scaler: It essentially shrinks the range such that the range is now between 0 and 1 (or -1 to 1 if there are negative values). This scaler works better for cases in which the standard scaler might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better. - robust scaler The RobustScaler uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rathar than the min-max, so that it is robust to outliers. - normalizer The normalizer scales each value by dividing each value by its magnitude in n-dimensional space for n number of features.Each point is now within 1 unit of the origin on this Cartesian co-ordinate system. https://benalexkeen.com/feature-scaling-with-scikit-learn/

Answer 70

Explanation Editing the metadata allows you to rename or change the data type of existing columns. Since a new column/feature is being added. The answer Apply SQL Transform https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/apply-sql-transformation

Answer 71

One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is to the upper left corner, the better the classifier's performance is (that is maximizing the true positive rate while minimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing. https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance#evaluating-a-binary-classification-model

Answer 72

Split data https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

Answer 73

Explanation Cross validation randomly divides the training data into a number of partitions, also called folds. The algorithm defaults to 10 folds if you have not previously partitioned the dataset. To divide the dataset into a different number of folds, you can use the Partition and Sample module and indicate how many folds to use. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/cross-validate-model

Answer 74

is a statistic used to measure the ordinal association between two measured quantities. It is a supported method of the Azure Machine Learning Feature selection.

Answer 75

Explanation Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number can be a date/time field, a column that contains age or dollar amounts, or even a percentage. For example, you might want to divide your dataset based on the cost of the items, group people by age ranges, or separate data by a calendar date. https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/split-data

Answer 76

- Split Rows: Use this option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split. By default, the data is divided 50/50. - Regular Expression Split: Choose this option when you want to divide your dataset by testing a single column for a value. For example, if you're analyzing sentiment, you can check for the presence of a particular product name in a text field. You can then divide the dataset into rows with the target product name and rows without the target product name. - Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number can be a date/time field, a column that contains age or dollar amounts, or even a percentage. For example, you might want to divide your dataset based on the cost of the items, group people by age ranges, or separate data by a calendar date.

Answer 77

The PyTorch estimator provides a simple way to launch a PyTorch training job on a compute target Step 1: Configure a DataTransferStep() to fetch new image data from public web portal, running on the cpu-compute compute target. Step 6: Configure a PythonScriptStep() to run image_resize.py on the cpu-compute compute target. Step 4: Configure an EstimatorStep() to run an estimator that runs the bird_classifier_train.py model training script on the gpu_compute compute target as GPUs are faster for computing than CPUs.

Answer 78

Bias for 5 -> low, variance for 5 -> high, Bias for 15 -> high variance for 15 -> low Explanation In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has low bias and high variance. https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

Answer 79

``` Explanation Automatically adjust weights inversely proportional to class frequencies in the input data ``` The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Penalty parameter Parameter: C : float, optional (default=1.0) Penalty parameter C of the error term. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Answer 80

The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points. Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.

Answer 81

from azureml.core import Workspace, Datastore, Experiment from azureml.train.sklearn import SKLearn ws = Workspace.from_config() exp = Experiment(workspace = ws, name = 'csv_training' ds = Datastrore.get(ws, datatsore_name = "training_date") data_ref = ds.path('csv_files') ... Explanation Besides passing the dataset through the inputs parameter in the estimator, you can also pass the dataset through script_params and get the data path (mounting point) in your training script via arguments. This way, you can keep your training script independent of azureml-sdk. In other words, you will be able use the same training script for local debugging and remote training on any cloud platform. https://docs.microsoft.com/es-es/azure/machine-learning/how-to-train-with-datasets

Answer 82

The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score, and AUC.

Answer 83

The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score, and AUC. https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance

Answer 84

Explanation Choose a one-tail or two-tail test. The default is a two-tailed test. This is the most common type of test, in which the expected distribution is symmetric around zero. Paired because they are estimated and reference values of the same thing (or at least I took that as implied). Thus, they are related and should vary together. Example: Type I error of unpaired and paired two-sample t-tests as a function of the correlation. The simulated random numbers originate from a bivariate normal distribution with a variance of 1. https: //docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/test-hypothesis-using-t-test https: //en.wikipedia.org/wiki/Student%27s_t-test

Answer 85

You need to specify the maximum number of times the algorithm should process the training cases for Number of learning iterations This is at accuracy tuning stage, the model design is completed, which means hidden layer specification is fixed now. Also thinking about hyperparameters, you should be able to do grid search. All 5 values mentioned in the questions can be set, however only the "Learning Rate" and "Number of Learning Iterations" can be set as a range. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

Answer 86

Create an azure kubernetes service (AKS) inference cluster Create a tabular dataset that supports versioning Create a compute instance and use it to run code in jupyter notebooks. https: //azure.microsoft.com/en-us/pricing/details/machine-learning/ https: //azure.github.io/azureml-sdk-for-r/reference/create_workspace.html

Answer 87

Register the azure blob storage containing the bird photograps as a datastore in azure machine learning service. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data

Answer 88

input = [training_ds.as_named_input('training_ds')] example: diabetes_ds = ws.datasets.get("Diabetes Dataset") hyper_estimator = SKLearn(source_directory = experiment_folder, inputs = [diabetes_ds.as_named_unput('diabetes')]) https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing%20Model%20Training.ipynb

Answer 89

You can use Azure Machine Learning to deploy a GPU-enabled model as a web service. Deploying a model on Azure Kubernetes Service (AKS) is one option. The AKS cluster provides a GPU resource that is used by the model for inference. Inference, or model scoring, is the phase where the deployed model is used to make predictions. Using GPUs instead of CPUs offers performance advantages on highly parallelizable computation. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus

Answer 90

-Register the model (optional, see below). -Prepare an inference configuration (unless using no-code deployment). An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model. -Prepare an entry script (unless using no-code deployment). -Choose a compute target. -Deploy the model to the compute target. Test the resulting web service. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli

Answer 91

import json import numpy as np import os from sklearn.externals import joblib ``` def init(): global model model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl') model = joblib.load(model_path) ``` ``` def run(data): try: data = np.array(json.loads(data)) result = model.predict(data) # You can return any data type, as long as it is JSON serializable. return result.tolist() except Exception as e: error = str(e) return error ```

Answer 92

- local webservices - azure ml compute instance webservice - AKS -> GPU - Azure container instances - AZML compute cluseters -> GPU - Azure function - Azure IOT Edge - Azure data box Edge https: //docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli

Answer 93

- penalize classification - generate synthetic samples in minority class - resample the dataset using under or oversampling Explanation Generate synthetic samples in the minority class: Try Generate Synthetic Samples A simple way to generate synthetic samples is to randomly sample the attributes from instances in the minority class. Resample the dataset using undersampling or oversampling: You can change the dataset that you use to build your predictive model to have more balanced data. This change is called sampling your dataset and there are two main methods that you can use to even-up the classes: § Consider testing under-sampling when you have an a lot data (tens- or hundreds of thousands of instances or more) § Consider testing over-sampling when you don't have a lot of data (tens of thousands of records or less) Try Penalized Models: You can use the same algorithms but give them a different perspective on the problem. Penalized classification imposes an additional cost on the model for making classification mistakes on the minority class during training. These penalties can bias the model to pay more attention to the minority class. References: https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

Answer 94

``` datastrore = Datastore.register_azure_blob_container (workspace = ws, datatstore_name = '...', contrainer_name = '....', account_name = ...', account_key = '...' , create_if_not_exists = False) ``` https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py Box 1: register_azure_blob_container Register an Azure Blob Container to the datastore. Box 2: create_if not_exists = False Create the file share if it does not exists, defaults to False.

Answer 95

AUC-weighted Explanation AUC_weighted is a Classification metric. AUC is the Area under the Receiver Operating Characteristic Curve. Weighted is the arithmetic mean of the score for each class, weighted by the number of true instances in each class. Incorrect Answers: § normalized_mean_absolute_error is a regression metric, not a classification metric. § When comparing approaches to imbalanced classification problems, consider using metrics beyond accuracy such as recall, precision, and AUROC. It may be that switching the metric you optimize for during parameter selection or model selection is enough to provide desirable performance detecting the minority class. § normalized_root_mean_squared_error is a regression metric, not a classification metric. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml

Answer 96

Relative Expression Split: Use this option whenever you want to apply a condition to a number column. Time-series data means you should split the data by date, otherwise you may have information leaking. The number can be a date/time field, a column that contains age or dollar amounts, or even a percentage. For example, you might want to divide your dataset based on the cost of the items, group people by age ranges, or separate data by a calendar date.

Answer 97

Explanation For PyTorch, TensorFlow and Chainer tasks, Azure Machine Learning provides respective PyTorch, TensorFlow, and Chainer estimators to simplify using these frameworks. Reference: https: //docs.microsoft.com/en-us/azure/machine-learning/how-to-train-ml-models https: //docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py What is an estimator Represents a generic estimator to train data using any supplied framework. This class is designed for use with machine learning frameworks that do not already have an Azure Machine Learning pre-configured estimator. Pre-configured estimators exist for Chainer, PyTorch, TensorFlow, and SKLearn. To create an Estimator that is not preconfigured, see Train models with Azure Machine Learning using estimator. The Estimator class wraps run configuration information to help simplify the tasks of specifying how a script is executed. It supports single-node as well as multi-node execution. Running the estimator produces a model in the output directory specified in your training script.

Answer 98

``` Explanation The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to data that lives in or is accessible from a datastore or at a Web URL. The Dataset class is abstract, so you will create an instance of either a FileDataset (referring to one or more files) or a TabularDataset that's created by from one or more files with delimited columns of data. ``` Example: from azureml.core import Dataset iris_tabular_dataset = Dataset.Tabular.from_delimited_files ([(def_blob_store, 'train-dataset/iris.csv')]) https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline

Answer 99

- Create a workspace - Set up machine learning resources * set up a datastore - Configure data with Dataset and PipelineData objects - set up a compute target - Configure the training run's environment - Construct your pipeline steps - Caching & reuse - Submit the pipeline

Answer 100

files = run.get_file_names()

Answer 101

Explanation One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is to the upper left corner, the better the classifier's performance is (that is maximizing the true positive rate while minimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance#evaluating-a-binary-classification-model

Answer 102

Explanation Difference between Erros: 105-095: 10 200-085: 115 250-100: 150 105-100: 5 ---> This is the best H value. 4 for Q1 400-050: 350 -> Highest Diff. So Poor 5 for Q2 4: Choose the one which has lower training and validation error and also the closest match. Minimize variance (difference between validation error and train error). 5: Minimize variance (difference between validation error and train error).

Answer 103

Explanation framework_version: The PyTorch version to be used for executing training code. PyTorch.get_supported_versions() returns a list of the versions supported by the current SDK. https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py

Answer 104

Explanation An overfit model is one where performance on the train set is good and continues to improve, whereas performance on the validation set improves to a point and then begins to degrade. -- performance on the train set is good and continues to improve: meaning train loss decrease --validation set improves to a point and then begins to degrade: validation begins to degrade means loss increase References: https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/

Answer 105

Explanation Specify the path. Example: The following code gets the workspace existing workspace and the desired datastore by name. And then passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ds. from azureml.core import Workspace, Datastore, Dataset datastore_name = 'your datastore name' # get existing workspace workspace = Workspace.from_config() # retrieve an existing datastore in the workspace by name datastore = Datastore.get(workspace, datastore_name) create a TabularDataset from 3 file paths in datastore datastore_paths = [(datastore, 'weather/2018/11.csv'), (datastore, 'weather/2018/12.csv'), (datastore, 'weather/201 9r.csv')] weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths) Activate Windows Go to Settings to activate Windows.

Answer 106

https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml

Answer 107

Explanation If you encounter problems deploying a model to ACI or AKS, you can try deploying it as a local web service. Using a local web service it makes easier to troubleshoot problems. The Docker image containing the model is downloaded and started on your local system. Deployment and runtime errors can be easier to diagnose by deploying the service as a container in a local Docker instance. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment

Answer 108

``` Explanation Tune Model Hyperparameters Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can choose which parameters should be changed during the training process, and which should remain fixed. ``` Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

Answer 109

Explanation One can inspect the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is to the upper left corner, the better the classifiers performance is (that is maximizing the true positive rate while minimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance#evaluating-a- binary-classification-model

Answer 110

CUDA is a parallel computing platform and programming model developed by Nvidia for general computing on its own GPUs (graphics processing units). CUDA enables developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation. Reference: https://www.infoworld.com/article/3299703/what-is-cuda-parallel-programming-for-gpus.html

Answer 111

Explanation Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is useful if the missing value can be considered randomly missing. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

Answer 112

Use the interpretabibiltiy package to genrate an explainer for the model. Explanation When you compute model explanations and visualize them, you're not limited to an existing model explanation for an automated ML model. You can also get an explanation for your model with different test data. The steps in this section show you how to compute and visualize engineered feature importance based on your test data. Incorrect Answers: § In the context of machine learning, data drift is the change in model input data that leads to model performance degradation. It is one of the top reasons where model accuracy degrades over time, thus monitoring data drift helps detect model performance issues. § A confusion matrix is used to describe the performance of a classification model. Each row displays the instances of the true, or actual class in your dataset, and each column represents the instances of the class that was predicted by the model. § Hyperparameters are adjustable parameters you choose for model training that guide the training process. The HyperDrive package helps you automate choosing these parameters. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl

Answer 113

Explanation Incorrect Answers: DLVM is a template on top of DSVM image. In terms of the packages, GPU drivers etc are all there in the DSVM image. Mostly it is for convenience during creation where we only allow DLVM to be created on GPU VM instances on Azure. PostgreSQL (CentOS) is only available in the Linux Edition. The Azure Geo AI Data Science VM (Geo-DSVM) delivers geospatial analytics capabilities from Microsoft's Data Science VM. Specifically, this VM extends the AI and data science toolkits in the Data Science VM by adding ESRI's market-leading ArcGIS Pro Geographic Information System. References: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview

Answer 114

xplanation primary_metric_name="accuracy", primary_metric_goal=PrimaryMetricGoal.MAXIMIZE Optimize the runs to maximize "accuracy". Make sure to log this value in your training script. The training script calculates the val_accuracy and logs it as "accuracy", which is used as the primary metric. Note: primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script. primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs. Reference: https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriverunconfig?view=azure-ml-py

Answer 115

Random grid xplanation Random Grid: Maximum number of runs on random grid: This option also controls the number of iterations over a random sampling of parameter values, but the values are not generated randomly from the specified range; instead, a matrix is created of all possible combinations of parameter values and a random sampling is taken over the matrix. This method is more efficient and less prone to regional oversampling or undersampling. If you are training a model that supports an integrated parameter sweep, you can also set a range of seed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoiding bias introduced by seed selection. You can also reduce the size of the grid and run a random grid sweep. Research has shown that this method yields the same results, but is more efficient computationally. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

Answer 116

Explanation Box 1: Split rows Use the Split Rows option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50. You can also randomize the selection of rows in each group, and use stratified sampling. In stratified sampling, you must select a single column of data for which you want values to be apportioned equally among the two result datasets. Box 2: 0.75 If you specify a number as a percentage, or if you use a string that contains the "%" character, the value is interpreted as a percentage. All percentage values must be within the range (0, 100), not including the values 0 and 100. Box 3: True To ensure splits are balanced. Box 4: True It is asking for balanced data. Stratified split should be true in order to be balanced. https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data-using-split-rows

Answer 117

Use the Split data into partitions option when you want to divide the dataset into subsets of the data. This option is also useful when you want to create a custom number of folds for cross-validation, or to split rows into several groups. 1. Add the Partition and Sample module to your experiment in Studio (classic), and connect the dataset. 2. For Partition or sample mode, select Assign to Folds. 3. Use replacement in the partitioning: Select this option if you want the sampled row to be put back into the pool of rows for potential reuse. As a result, the same row might be assigned to several folds. 4. If you do not use replacement (the default option), the sampled row is not put back into the pool of rows for potential reuse. As a result, each row can be assigned to only one fold. 5. Randomized split: Select this option if you want rows to be randomly assigned to folds. If you do not select this option, rows are assigned to folds using the round-robin method. References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

Answer 118

Explanation Featurization str or FeaturizationConfig Values: 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Column type is automatically detected. Based on the detected column type preprocessing/featurization is done as follows: § Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute missing values. § Numeric: Impute missing values, cluster distance, weight of evidence. § DateTime: Several features such as day, seconds, minutes, hours etc. § Text: Bag of words, pre-trained Word embedding, text target encoding. Reference: https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py

Answer 119

init() run(mini_batch) Explanation Reference: https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines/parallel-run

Answer 120

Explanation Box 1: mlc_cluster With Azure Machine Learning, you can train your model on a variety of resources or environments, collectively referred to as compute targets. A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDlnsight or a remote virtual machine. Box 2: aks_cluster Real-time endpoints must be deployed to an Azure Kubernetes Service cluster. https: //docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target https: //docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets https: //docs.microsoft.com/en-us/azure/machine-learning/concept-designer

Questions set 1 Flashcards

(144 cards)