Principles of Machine Learning Flashcards by Knowledge and Wisdom Academy

Good to remember

In general when doing machine learning, we need to represent each observation in the training and test sets as a vector of numbers, and the label is also represented by a number. ( This is also for classifying images)

How well did you know this?

Not at all

Perfectly

What is a feature?

Feature. A symbolic or numeric property of a real world object that might be useful to determine its class. The word ‘attribute’ is used for this as well. Different objects however may have different numbers of attributes, while usually for all objects in the same problem the same features can be measured. Thereby objects may be represented by a feature vector, or by a set of attributes.

How well did you know this?

Not at all

Perfectly

When do you use the Classification algorithm?

Classification. When the data are being used to predict a category, supervised learning is also called classification.

This is the case when assigning an image as a picture of either a ‘cat’ or a ‘dog’. When there are only two choices, this is called two-class or binomial classification. When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.

How well did you know this?

Not at all

Perfectly

Machine Learning Cheat Sheet : When to use which algorithm?

How well did you know this?

Not at all

Perfectly

Loss functions for Classification

How well did you know this?

Not at all

Perfectly

Statistical Learning Theory

How well did you know this?

Not at all

Perfectly

The key principle in statistical learning theory is the principle of———————————-

Ockham’s razor.

How well did you know this?

Not at all

Perfectly

What is the curse of dimensionality?

Best Explanations here :

https://www.quora.com/What-is-the-curse-of-dimensionality

The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensional data.

Let’s say you have a straight line 100 yards long and you dropped a penny somewhere on it. It wouldn’t be too hard to find. You walk along the line and it takes two minutes.

Now let’s say you have a square 100 yards on each side and you dropped a penny somewhere on it. It would be pretty hard, like searching across two football fields stuck together. It could take days.

Now a cube 100 yards across. That’s like searching a 30-story building the size of a football stadium. Ugh.

How well did you know this?

Not at all

Perfectly

What is regularization?

In Machine learning and statistics, a common task is to fit a model to a set of training data. This model can be used later to make predictions or classify new data points.

When the model fits the training data but does not have a good predicting performance and generalization power, we have an overfitting problem.

Regularization is a technique used to avoid this overfitting problem. The idea behind regularization is that models that overfit the data are complex models that have for example too many parameters

How well did you know this?

Not at all

Perfectly

Logistic Regression

How well did you know this?

Not at all

Perfectly

How do you use categorical features with scikit-learn?

The categorical features have to be converted to dummy or in python speak in other times called indicator variables.

So there is no difference between a dummy and an indicator variable.

How well did you know this?

Not at all

Perfectly

How do you calculate Miscalssification Error

(FP + FN ) / N

How well did you know this?

Not at all

Perfectly

Creating a Classifier in Azure ML

https://courses.edx.org/courses/course-v1:Microsoft+DAT203.2x+6T2016/courseware/786e3fc9eacf4756b3a8091b4a618b4c/c6166f9b4a4641598cc43bab88403a64/

How well did you know this?

Not at all

Perfectly

What are some of the ways to tune a model?

Feature Selection
Realization
Hyperparameters
Cross Validation

How well did you know this?

Not at all

Perfectly

Come up with stuff for Support VectorMAchines

How well did you know this?

Not at all

Perfectly

Confusion Matrix : Accuracy,Recall,Precision and F1 Score

How well did you know this?

Not at all

Perfectly

Which three of the following are reasons for visualizing a data set before attempting to build a supervised machine learning model?

Develop an understanding of the relationship between the features and the label to determine which features are likely to be predictive of the label and should be used in training the machine learning model.
Develop an understanding of which features are redundant or collinear with other features and should be eliminated from the dataset before training the machine learning model.
Find features that are not likely to be predictive of the label and should be removed from the dataset before training the machine learning model.

How well did you know this?

Not at all

Perfectly

You are training a supervised machine learning model. You want to ensure that when training and testing the dataset you do not introduce any unintentional bias.

What should you do?

Split the dataset into two non-overlapping portions, then train the model using one portion and test it using the other. correct

You are training a supervised machine learning model.

Which three kinds of features should be pruned from the dataset?

Features that are collinear or codependent on other features in the dataset.
Features that increase model error during training and testing.
Features that have little impact on model performance during training and testing.

While evaluating the performance of a regression model you discover that the residuals are randomly distributed and exhibit no significant structure with respect to the values of the label or the features.

This indicates which two of the following conditions are true?

The model is likely a good fit to the data.
The information in the features is being exploited for the most part.

When examining the results of cross validation for a machine learning model you notice that the following conditions are true:

The values of the metrics are similar across the folds.
The standard deviation of the metrics is small compared to the mean values.
The mean values of the metrics are in an acceptable range.

Given these conditions you can conclude that which of the following statements is true?

The performance of the model should generalize well

While exploring a dataset for training a two-class (binary) classification problem, you notice the following properties of the data set:

Certain features exhibit noticeable separation in the values or categories based on the categories of the label.
Certain features exhibit little separation in the values or categories based on the categories of the label.

Which two of the following feature selection actions should you take before training the model?

The features exhibiting separation should be retained in the dataset.
The features with poor separation should be pruned from the dataset.

When exploring the k-means clustering of a data set you examine the projections of the first two principle component of the cluster ellipses, for a certain number of clusters, and you observe:

The major and minor axes of each of the ellipses are of distinctly different length.
The directions of major axes of the ellipses are distinctly different.

Which of the following statements is true?

The number of clusters chosen fits this dataset well.

You create an experiment that uses a Train Matchbox Recommender module to train a recommendation model, and add a Score Matchbox Recommender module to generate a prediction. You want to use the model in an online retail site to predict additional items a user might want to purchase based on the item they are currently viewing.

Which recommender prediction kind should you configure the Score Matchbox Recommender module to use?

Related Items .

You create an experiment that uses a Train Matchbox Recommender module and a Score Matchbox Recommender module with the recommender prediction kind set to Item Recommendation. How should you evaluate the performance of your model?

Add an Evaluate Recommender module and review the Normalized Discounted Cumulative Gain (NDCG) value. The closer this value is to 1, the better the model is performing.

You have trained and tested a regression model, and plotted the predicted labels against the actual label values in the training and test data. The resulting chart shows strong correlation between the predicted labels and the training data, but the actual label values from the test data vary significantly from the predicted values. What can you conclude about your model?

The model is over-fitted.

You create an Azure ML experiment for a dataset of credit card transactions. The experiment contains two classifications models: One using a Two-Class Decision Forest algorithm and one using a Two-Class Support Vector Machine algorithm. When you evaluate the models, they exhibit the following performance metrics: * Two-Class Decision Forest: Accuracy = 0.964, Recall = 0.945, AUC = 0.987 * Two-Class Support Vector Machine: Accuracy = 0.765, Recall = 0.768, AUC=0.799 What should you conclude from these metrics?

For this dataset, the Two-Class Decision Forest model appears to be the best model.

You are preparing data for a classification model. The dataset you are using includes a large number of columns, not all of which are useful in predicting the label. You have used the Permutation Feature Importance module in Azure ML to determine the statistical importance of each feature in the dataset. What should you do next?

Iteratively remove the least important features until the model accuracy starts to degrade.

You are creating a classification model using the Two-Class Logistical Regression algorithm. You want to identify the optimal values for the regularization weight parameters for this algorithm. What should you do?

Split the test dataset and use one portion to try a range of parameters with the Tune Model Hyperparameters module, and another to score the trained model it produces.

When exploring the k-means clustering of a dataset you continue to increase the number of clusters one-by-one until you observe that the projection of the first two principal components show: * The major and minor axes of each of the ellipses are of similar lengths. * The directions of major axes of the ellipses are relatively aligned. Based on these observations, what should you do?

Reduce the number of clusters chosen.

You create a binary (two-class) classification machine learning model. When evaluating the model, you observe the following metrics: * Accuracy: 0.9 * Area under the curve (AUC): 0.8 * Recall: 0.3 What can you conclude about the performance of your model?

Proportionately, false negatives predominate the errors.

You create a regression model, and then cross-validate it. You observe the following results: * The values of the performance metrics change significantly from fold to fold. * The standard deviation of the performance metrics is close to the value of the metrics. What can you conclude about your mode?

The model will likely not generalize well.

You create an experiment that uses a Train Matchbox Recommender module to train a recommendation model, and add a Score Matchbox Recommender module to generate a prediction. You want to use the model in a music streaming service to recommend songs for the currently logged in user. Which recommender prediction kind should you configure the Score Matchbox Recommender module to use?

Item Recommendation

You have created two models in Azure ML, using two different classification algorithms. The following ROC chart shows the relative performance of these models, with Model A represented as theScored dataset and Model B represented as the Scored dataset to compare. **What can you determine from the ROC chart?**

Model A performs better than Model B

Feature Selection : Good to Remember

Selecting the best features is essential to the optimal performance of machine learning models. Only features that contribute to measurably improving model performance should be used. Using extraneous features can contribute noise to training and predictions from machine learning models. This behavior can prevent machine learning models from generalizing from training data to data received in production.

GEnerally how is feature selection done?

Feature selection is generally performed in one of two ways. * In forward stepwise feature selection, the most promising features are added one at a time. * In reverse stepwise feature selection, the least important features are pruned one, or a few, at a time.

Which module in Azure ML measures the relationship between the features and the label and ranks their importance ( co-efficients)?

Permutation Feature Importance Module. Note: When examining the results produced by the Permutation Feature Importance module, the absolute values of the feature importance measure should be compared.

What is Regularization?

Regularization is an alternative to feature selection. In summary, regularization forces the weights of less important features toward zero. This process helps to ensure that models will generalize in production. Regularization can be performed on models with very large numbers of features, for which manual feature selection is infeasible.

What is Cross Validation?

Cross validation uses a resampling scheme to train and evaluate machine learning model. The model is trained and evaluated with a series of folds, created by resampling with replacement. The model performance metrics are averaged over the results of the folds. This aggregated performance measure is more likely to represent the actual performance a model will achieve in production.

What can Ada-boosted models be used for?

Ada-boosted tree models can be used for classification or regression.

How to Determine the Optimal Number of Clusters in Azure ML?

**K-Means Clustering** module

Good to Remember : Recommenders

Recommenders are an interesting and useful class of machine learning models. Creating good recommenders is challenging since there is no objective way to measure how good a recommendation is for a given individual. There is no way to know if the recommendation is the best possible for an individual. Further, the ratings provided by the users, is based on their personal subjective judgement.

What does AUC stand for?

Area Under Curve