Principles of Machine Learning Flashcards
Good to remember
In general when doing machine learning, we need to represent each observation in the training and test sets as a vector of numbers, and the label is also represented by a number. ( This is also for classifying images)
What is a feature?
Feature. A symbolic or numeric property of a real world object that might be useful to determine its class. The word ‘attribute’ is used for this as well. Different objects however may have different numbers of attributes, while usually for all objects in the same problem the same features can be measured. Thereby objects may be represented by a feature vector, or by a set of attributes.
When do you use the Classification algorithm?
Classification. When the data are being used to predict a category, supervised learning is also called classification.
This is the case when assigning an image as a picture of either a ‘cat’ or a ‘dog’. When there are only two choices, this is called two-class or binomial classification. When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.
Machine Learning Cheat Sheet : When to use which algorithm?

Loss functions for Classification
Statistical Learning Theory
The key principle in statistical learning theory is the principle of———————————-
Ockham’s razor.
What is the curse of dimensionality?
Best Explanations here :
https://www.quora.com/What-is-the-curse-of-dimensionality
The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensional data.
Let’s say you have a straight line 100 yards long and you dropped a penny somewhere on it. It wouldn’t be too hard to find. You walk along the line and it takes two minutes.
Now let’s say you have a square 100 yards on each side and you dropped a penny somewhere on it. It would be pretty hard, like searching across two football fields stuck together. It could take days.
Now a cube 100 yards across. That’s like searching a 30-story building the size of a football stadium. Ugh.
What is regularization?
In Machine learning and statistics, a common task is to fit a model to a set of training data. This model can be used later to make predictions or classify new data points.
When the model fits the training data but does not have a good predicting performance and generalization power, we have an overfitting problem.
Regularization is a technique used to avoid this overfitting problem. The idea behind regularization is that models that overfit the data are complex models that have for example too many parameters
Logistic Regression
How do you use categorical features with scikit-learn?
The categorical features have to be converted to dummy or in python speak in other times called indicator variables.
So there is no difference between a dummy and an indicator variable.
How do you calculate Miscalssification Error
(FP + FN ) / N
Creating a Classifier in Azure ML
https://courses.edx.org/courses/course-v1:Microsoft+DAT203.2x+6T2016/courseware/786e3fc9eacf4756b3a8091b4a618b4c/c6166f9b4a4641598cc43bab88403a64/
What are some of the ways to tune a model?
- Feature Selection
- Realization
- Hyperparameters
- Cross Validation
Come up with stuff for Support VectorMAchines
Confusion Matrix : Accuracy,Recall,Precision and F1 Score


Which three of the following are reasons for visualizing a data set before attempting to build a supervised machine learning model?
- Develop an understanding of the relationship between the features and the label to determine which features are likely to be predictive of the label and should be used in training the machine learning model.
- Develop an understanding of which features are redundant or collinear with other features and should be eliminated from the dataset before training the machine learning model.
- Find features that are not likely to be predictive of the label and should be removed from the dataset before training the machine learning model.
You are training a supervised machine learning model. You want to ensure that when training and testing the dataset you do not introduce any unintentional bias.
What should you do?
Split the dataset into two non-overlapping portions, then train the model using one portion and test it using the other. correct
You are training a supervised machine learning model.
Which three kinds of features should be pruned from the dataset?
- Features that are collinear or codependent on other features in the dataset.
- Features that increase model error during training and testing.
- Features that have little impact on model performance during training and testing.
While evaluating the performance of a regression model you discover that the residuals are randomly distributed and exhibit no significant structure with respect to the values of the label or the features.
This indicates which two of the following conditions are true?
- The model is likely a good fit to the data.
- The information in the features is being exploited for the most part.
When examining the results of cross validation for a machine learning model you notice that the following conditions are true:
- The values of the metrics are similar across the folds.
- The standard deviation of the metrics is small compared to the mean values.
- The mean values of the metrics are in an acceptable range.
Given these conditions you can conclude that which of the following statements is true?
The performance of the model should generalize well
While exploring a dataset for training a two-class (binary) classification problem, you notice the following properties of the data set:
- Certain features exhibit noticeable separation in the values or categories based on the categories of the label.
- Certain features exhibit little separation in the values or categories based on the categories of the label.
Which two of the following feature selection actions should you take before training the model?
- The features exhibiting separation should be retained in the dataset.
- The features with poor separation should be pruned from the dataset.
When exploring the k-means clustering of a data set you examine the projections of the first two principle component of the cluster ellipses, for a certain number of clusters, and you observe:
- The major and minor axes of each of the ellipses are of distinctly different length.
- The directions of major axes of the ellipses are distinctly different.
Which of the following statements is true?
The number of clusters chosen fits this dataset well.
You create an experiment that uses a Train Matchbox Recommender module to train a recommendation model, and add a Score Matchbox Recommender module to generate a prediction. You want to use the model in an online retail site to predict additional items a user might want to purchase based on the item they are currently viewing.
Which recommender prediction kind should you configure the Score Matchbox Recommender module to use?
Related Items .
