Chapter 5 : Machine Learning Flashcards by Dan Tav

What is machine learning?

Computer algorithms that can learn from data to make determinations or predictions on new data without being explicitly programmed

How well did you know this?

Not at all

Perfectly

Why isn’t search always effective

Can’t deal with new data

2. Deal with unforeseen circumstances

How well did you know this?

Not at all

Perfectly

What are different types of ML models

Classifier (Chooses an output)

2. Regressor (Generates an output)

How well did you know this?

Not at all

Perfectly

What is difference between discriminative and generative models?

Discriminative models draw boundaries in data space, while generative models attempt to map out the distribution of the model.

How well did you know this?

Not at all

Perfectly

What are evaluation metrics? Which one is often best to use?

Precision, Accuracy, Recall are main ones but can sometimes give misleading results. F1-Score which is a mix of precision and recall often gives the most meaningful results

How well did you know this?

Not at all

Perfectly

What is ROC curve?

Can tell how well classifier is working by compare true positive and false positive

How well did you know this?

Not at all

Perfectly

What is k-cross fold validation?

Split data into N-1 folds for training/validation and 1 fold for testing

How well did you know this?

Not at all

Perfectly

What is leave-one-out validation?

Pick N-1 data points for training/validation and 1 last point for testing

How well did you know this?

Not at all

Perfectly

What is the no free lunch theroem?

There is no one machine learning algorithm that can be applied to all problems, different models must be tested.

How well did you know this?

Not at all

Perfectly

What is regression?

Fitting data onto a polynomial curve

How well did you know this?

Not at all

Perfectly

How do we train regression?

Minimizing error function

How well did you know this?

Not at all

Perfectly

What is univariate linear regression?

A regression model with N = 1 and with 1 variable

How well did you know this?

Not at all

Perfectly

What is the time complexity of gradient descent?

O(n^3)

How well did you know this?

Not at all

Perfectly

Possible outcomes of gradient descent?

Converges
Diverges
Oscillates

How well did you know this?

Not at all

Perfectly

Logistic Regression

Used for classification and giving probability

How well did you know this?

Not at all

Perfectly

What is Naive Bayes

A generative classification model

How well did you know this?

Not at all

Perfectly

Can Naive bayes be used in regression?

True

How well did you know this?

Not at all

Perfectly

What is key assumption for Naive Bayes to work?

All features are independent of one another

How well did you know this?

Not at all

Perfectly

What is Overfitting?

When a model has a low error rate in training but then a high error rate in testing.

How well did you know this?

Not at all

Perfectly

What is generalization?

Notion of learning from some data to make conclusions based on unseen/excluded/new data

How well did you know this?

Not at all

Perfectly

How to prevent overfitting?

Study These Flashcards

Lower order models
More data
Regularization

What is bias-variance tradeoff?

Study These Flashcards

Variability: Variability of model with respect to inputs
Bias: Difference between average model and average of the target data

Can K-nearest neighbor be used as a regressor?

Study These Flashcards

True

How should k be chosen in K-nearest neighbour?

Study These Flashcards

K should be odd to avoid ties
Not too small or else overfit
Not too large or else underfit

Advantages/Disadvantage of kNN algorithm?

Advantage: Easy to train, intuitive algorithm Disadvantage: Computationally expensive as dataset grows

What is support vector machine?

Binary Classifier that splits feature space by hyperplane. Goal is to find kernelized max-margin hyperplane.

What do we do if data is not linearly separable?

Use kernel trick to project data into higher space

What is entropy?

The measurement of uncertainty (fair coin has high entropy, bias coin has low entropy)

Does the order of features in binary tree make a different?

True

What is ensemble learning?

Create several different classifiers for a problem. Samples (data) are run through all of them and outcome is determined by average or voting.

What are two most common methods of ensemble learning?

Bagging and boosting

What is bootstrapping in bagging method?

The concept of randomly sampling from a dataset with replacement, to increase size of dataset

What is the assumption of classifiers using ensemble learning?

- Slightly better than chance - Somewhat different - High variance

What is Boosting?

Does not use bootstrap sampling instead weak classifiers are trained based off of previous iterations of classifiers

How does Adaboost prevent overfitting?

Gives more weights to classifications that were done incorrectly.

What are random forests?

Use many decision trees to make a decision

What is bagged tree ensemble?

A number of trees are created through bootstrapped-sampled data. The final classification is based on a vote or average.

Do boosted trees sample data?

While rare, it is possible using scholastic gradient boosted trees or XGBoosted trees

What are the main fusion strategies of datasets?

1. Input level fusion 2. Feature-level fusion 3. Score-level fusion

How is complexity of an ANN (Artificial Neural Network) Determined?

Hidden layers

What are some of the parameters trained in regular ANN?

1. Type of network 2. Number of layers 3. Transfer functions

What is Regularization?

Technique to avoid overfitting

What is the lambda symbol do in regularization?

Reduces overfitting (variance) but increases bias (underfitting)

What does p symbol mean in regularization?

Determines which type of regularization is done (p = 1, p = 2 most common)

What is K-means?

Unsupervised machine learning algorithm that clusters data

How do we train K-means?

1. Select k 2. Randomly select centroids 3. Assign data point to closest centroid 4. Re-calculate center of centroid 5. Repeat 3,4 until convergence

How do we choose K in K-means?

1. Stop after # of iterations | 2. When the algorithm change in error becomes very small

What is CNN?

Convolution Neural Network with many hidden layers. Requires large datasets and large computing resources.

Other than convolution layer, what are other types of layer in CNN?

1. Pooling layer (max pooling, average pooling) | 2. Fully connected layers

What are some parameters that can be trained in CNN?

1. Number of filters 2. Stride 3. Linear Spatial Extent 4. Batch size

Is filter proffered to be odd size or even size?

Odd size

If filter size is N how much does the size of output decrease?

By N/2 on each side of output. Output can retain it's size if it uses zero-padding.

Chapter 5 : Machine Learning Flashcards

(52 cards)