Chapter 5 : Machine Learning Flashcards
What is machine learning?
Computer algorithms that can learn from data to make determinations or predictions on new data without being explicitly programmed
Why isn’t search always effective
- Can’t deal with new data
2. Deal with unforeseen circumstances
What are different types of ML models
- Classifier (Chooses an output)
2. Regressor (Generates an output)
What is difference between discriminative and generative models?
Discriminative models draw boundaries in data space, while generative models attempt to map out the distribution of the model.
What are evaluation metrics? Which one is often best to use?
Precision, Accuracy, Recall are main ones but can sometimes give misleading results. F1-Score which is a mix of precision and recall often gives the most meaningful results
What is ROC curve?
Can tell how well classifier is working by compare true positive and false positive
What is k-cross fold validation?
Split data into N-1 folds for training/validation and 1 fold for testing
What is leave-one-out validation?
Pick N-1 data points for training/validation and 1 last point for testing
What is the no free lunch theroem?
There is no one machine learning algorithm that can be applied to all problems, different models must be tested.
What is regression?
Fitting data onto a polynomial curve
How do we train regression?
Minimizing error function
What is univariate linear regression?
A regression model with N = 1 and with 1 variable
What is the time complexity of gradient descent?
O(n^3)
Possible outcomes of gradient descent?
- Converges
- Diverges
- Oscillates
Logistic Regression
Used for classification and giving probability
What is Naive Bayes
A generative classification model
Can Naive bayes be used in regression?
True
What is key assumption for Naive Bayes to work?
All features are independent of one another
What is Overfitting?
When a model has a low error rate in training but then a high error rate in testing.
What is generalization?
Notion of learning from some data to make conclusions based on unseen/excluded/new data