Chapter 5 : Machine Learning Flashcards
What is machine learning?
Computer algorithms that can learn from data to make determinations or predictions on new data without being explicitly programmed
Why isn’t search always effective
- Can’t deal with new data
2. Deal with unforeseen circumstances
What are different types of ML models
- Classifier (Chooses an output)
2. Regressor (Generates an output)
What is difference between discriminative and generative models?
Discriminative models draw boundaries in data space, while generative models attempt to map out the distribution of the model.
What are evaluation metrics? Which one is often best to use?
Precision, Accuracy, Recall are main ones but can sometimes give misleading results. F1-Score which is a mix of precision and recall often gives the most meaningful results
What is ROC curve?
Can tell how well classifier is working by compare true positive and false positive
What is k-cross fold validation?
Split data into N-1 folds for training/validation and 1 fold for testing
What is leave-one-out validation?
Pick N-1 data points for training/validation and 1 last point for testing
What is the no free lunch theroem?
There is no one machine learning algorithm that can be applied to all problems, different models must be tested.
What is regression?
Fitting data onto a polynomial curve
How do we train regression?
Minimizing error function
What is univariate linear regression?
A regression model with N = 1 and with 1 variable
What is the time complexity of gradient descent?
O(n^3)
Possible outcomes of gradient descent?
- Converges
- Diverges
- Oscillates
Logistic Regression
Used for classification and giving probability
What is Naive Bayes
A generative classification model
Can Naive bayes be used in regression?
True
What is key assumption for Naive Bayes to work?
All features are independent of one another
What is Overfitting?
When a model has a low error rate in training but then a high error rate in testing.
What is generalization?
Notion of learning from some data to make conclusions based on unseen/excluded/new data
How to prevent overfitting?
- Lower order models
- More data
- Regularization
What is bias-variance tradeoff?
Variability: Variability of model with respect to inputs
Bias: Difference between average model and average of the target data
Can K-nearest neighbor be used as a regressor?
True
How should k be chosen in K-nearest neighbour?
- K should be odd to avoid ties
- Not too small or else overfit
- Not too large or else underfit
Advantages/Disadvantage of kNN algorithm?
Advantage: Easy to train, intuitive algorithm
Disadvantage: Computationally expensive as dataset grows
What is support vector machine?
Binary Classifier that splits feature space by hyperplane. Goal is to find kernelized max-margin hyperplane.
What do we do if data is not linearly separable?
Use kernel trick to project data into higher space
What is entropy?
The measurement of uncertainty (fair coin has high entropy, bias coin has low entropy)
Does the order of features in binary tree make a different?
True
What is ensemble learning?
Create several different classifiers for a problem. Samples (data) are run through all of them and outcome is determined by average or voting.
What are two most common methods of ensemble learning?
Bagging and boosting
What is bootstrapping in bagging method?
The concept of randomly sampling from a dataset with replacement, to increase size of dataset
What is the assumption of classifiers using ensemble learning?
- Slightly better than chance
- Somewhat different
- High variance
What is Boosting?
Does not use bootstrap sampling instead weak classifiers are trained based off of previous iterations of classifiers
How does Adaboost prevent overfitting?
Gives more weights to classifications that were done incorrectly.
What are random forests?
Use many decision trees to make a decision
What is bagged tree ensemble?
A number of trees are created through bootstrapped-sampled data. The final classification is based on a vote or average.
Do boosted trees sample data?
While rare, it is possible using scholastic gradient boosted trees or XGBoosted trees
What are the main fusion strategies of datasets?
- Input level fusion
- Feature-level fusion
- Score-level fusion
How is complexity of an ANN (Artificial Neural Network) Determined?
Hidden layers
What are some of the parameters trained in regular ANN?
- Type of network
- Number of layers
- Transfer functions
What is Regularization?
Technique to avoid overfitting
What is the lambda symbol do in regularization?
Reduces overfitting (variance) but increases bias (underfitting)
What does p symbol mean in regularization?
Determines which type of regularization is done (p = 1, p = 2 most common)
What is K-means?
Unsupervised machine learning algorithm that clusters data
How do we train K-means?
- Select k
- Randomly select centroids
- Assign data point to closest centroid
- Re-calculate center of centroid
- Repeat 3,4 until convergence
How do we choose K in K-means?
- Stop after # of iterations
2. When the algorithm change in error becomes very small
What is CNN?
Convolution Neural Network with many hidden layers. Requires large datasets and large computing resources.
Other than convolution layer, what are other types of layer in CNN?
- Pooling layer (max pooling, average pooling)
2. Fully connected layers
What are some parameters that can be trained in CNN?
- Number of filters
- Stride
- Linear Spatial Extent
- Batch size
Is filter proffered to be odd size or even size?
Odd size
If filter size is N how much does the size of output decrease?
By N/2 on each side of output. Output can retain it’s size if it uses zero-padding.