Principles of Machine Learning Flashcards
Good to remember
In general when doing machine learning, we need to represent each observation in the training and test sets as a vector of numbers, and the label is also represented by a number. ( This is also for classifying images)
What is a feature?
Feature. A symbolic or numeric property of a real world object that might be useful to determine its class. The word ‘attribute’ is used for this as well. Different objects however may have different numbers of attributes, while usually for all objects in the same problem the same features can be measured. Thereby objects may be represented by a feature vector, or by a set of attributes.
When do you use the Classification algorithm?
Classification. When the data are being used to predict a category, supervised learning is also called classification.
This is the case when assigning an image as a picture of either a ‘cat’ or a ‘dog’. When there are only two choices, this is called two-class or binomial classification. When there are more categories, as when predicting the winner of the NCAA March Madness tournament, this problem is known as multi-class classification.
Machine Learning Cheat Sheet : When to use which algorithm?
Loss functions for Classification
Statistical Learning Theory
The key principle in statistical learning theory is the principle of———————————-
Ockham’s razor.
What is the curse of dimensionality?
Best Explanations here :
https://www.quora.com/What-is-the-curse-of-dimensionality
The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensional data.
Let’s say you have a straight line 100 yards long and you dropped a penny somewhere on it. It wouldn’t be too hard to find. You walk along the line and it takes two minutes.
Now let’s say you have a square 100 yards on each side and you dropped a penny somewhere on it. It would be pretty hard, like searching across two football fields stuck together. It could take days.
Now a cube 100 yards across. That’s like searching a 30-story building the size of a football stadium. Ugh.
What is regularization?
In Machine learning and statistics, a common task is to fit a model to a set of training data. This model can be used later to make predictions or classify new data points.
When the model fits the training data but does not have a good predicting performance and generalization power, we have an overfitting problem.
Regularization is a technique used to avoid this overfitting problem. The idea behind regularization is that models that overfit the data are complex models that have for example too many parameters
Logistic Regression
How do you use categorical features with scikit-learn?
The categorical features have to be converted to dummy or in python speak in other times called indicator variables.
So there is no difference between a dummy and an indicator variable.
How do you calculate Miscalssification Error
(FP + FN ) / N
Creating a Classifier in Azure ML
https://courses.edx.org/courses/course-v1:Microsoft+DAT203.2x+6T2016/courseware/786e3fc9eacf4756b3a8091b4a618b4c/c6166f9b4a4641598cc43bab88403a64/
What are some of the ways to tune a model?
- Feature Selection
- Realization
- Hyperparameters
- Cross Validation
Come up with stuff for Support VectorMAchines
Confusion Matrix : Accuracy,Recall,Precision and F1 Score
Which three of the following are reasons for visualizing a data set before attempting to build a supervised machine learning model?
- Develop an understanding of the relationship between the features and the label to determine which features are likely to be predictive of the label and should be used in training the machine learning model.
- Develop an understanding of which features are redundant or collinear with other features and should be eliminated from the dataset before training the machine learning model.
- Find features that are not likely to be predictive of the label and should be removed from the dataset before training the machine learning model.