Midterm-Yaseen Flashcards

Question 1

Q

What is supervised learning?

Answer

A

Goal is to make accurate predictions for new, never-before-seen data
we have input and output pairs to “learn” from

Examples:

k-Nearest Neighbors
Linear Models
Naive Bayes Classifiers
Decision Trees
Ensembles of Decision Tress
Kernelized Support Vector Machines
Neural Networks (Deep Learning)

Question 2

Q

Describe the k-Nearest Neighbors algorithm.

Answer

A

-Utilizing the training dataset, find the nearest data points and classify according these data points.

Note:
For KNeighborsClassifier we use majority vote to determine classificaiton

For KnEighborsRegressor we use mean (R^2)

Question 3

Q

Explain what is meant by “underfitting.”

Answer

A

A model that cannot capture the variations present in the training data.

Question 4

Q

Explain the concept of “overfitting.”

Answer

A

A model that focuses too much on the training data and is not able to generalize to the new data very well

Question 5

Q

When should we use Nearest Neighbors methods?

Answer

A

ideal for small datasets
good as a baseline
easy to explain

Question 6

Q

When should we use Linear Models?

Answer

A

go-to as a first method
good for very large datasets
good for very high-dimensional data

Question 7

Q

When should we use Naive Bayes?

Answer

A

Only used for classification
faster than linear models
good for very large datasets and high-dimensional data

Disadvantage: often less accurate than linear models

Question 8

Q

What are the advantages of Decision Tree methods?

Answer

A

very fast
don’t need scaling of the data
can visualize and easily explained

Question 9

Q

What are the advantages of Random Forest?

Answer

A

Nearly always perform better than a single decision tree
very robust and powerful
Does NOT require scaling of data

Question 10

Q

What is a disadvantage of Random Forest? (When should they not be used)

Answer

A

Not good for high-dimensional sparse data

Question 11

Q

Compare Gradient boosted decision Trees and Random Forests in terms of advantages.

Answer

A

Gradient forests are slightly more accurate than random forests
Gradient forests are slower to train than random forests
Gradient forests are faster to predict and smaller in memory than random forests
Gradient forests require more parameter tuning than random forests

Question 12

Q

Describe the advantages of Support Vector Machines.

Answer

A

powerful for medium-sized datasets of features with similar meaning
Does require scaling of data
sensitive to parameter settings

Question 13

Q

Describe the advantages of Neural Networks

Answer

A

Can build very complex models (particularly for large datasets)
Sensitive to scaling of the data and choice of parameters
large models require a long time to train

Question 14

Q

What are the two primary types of unsupervised learning?

Answer

A

unsupervised transformations: creating a new representation of data which might be easier for humans or other machine learning algorithms to understand compared to the original representation.
(dimensionality reduction, topic extraction)
Clustering algorithms: partition data into distinct groups of similar items.
(like clustering in supervised learning but we don’t have output to compare to)

Question 15

Q

What is the primary challenge in unsupervised learning?

Answer

A

no outcome to compare to (how well did we do? nobody knows!)
We must manually inspect the results to see how we did.

Question 16

Q

What is a common utilization of unsupervised algorithms?

Answer

Study These Flashcards

A

Exploratory setting:

-useful to change representation of data to then use a supervised learning method

Question 17

Q

Why is k-nearest neighbors algorithm is not often used in practice?

Answer

Study These Flashcards

A

-slow prediction
-inability to handle many features
(although it is very easy to understand and reasonable performance without a lot of adjustments)

Question 18

Q

What is k-nearest neighbors best utilized for?

Answer

Study These Flashcards

A

-good baseline method before considering more advanced techniques

Question 19

Q

Consider the following training and test set scores:
Training set score: 0.96
Test set score: 0.63
What is this a sign of?

Answer

Study These Flashcards

A

The discrepancy is a sign of overfitting.

Question 20

Q

Consider the two sets of training/test set score pairs for two different machine learning algorithms:

Model 1:
Training set score: 0.98
Test set score: 0.82

Model 2:
Training set score: 0.84
Test set score: 0.80

Which model would you choose and why?

Answer

Study These Flashcards

A

Model 2.

Although Model 1 has a slightly higher Test set score than Model 2, there is a much higher discrepancy between the Training and Test scores in Model 1 which is a sign of overfitting.

In Model 2, we see a lower training set score and a much smaller discrepancy between the Training and Test set score.

I would expect Model 2 to generalize better than Model 1 in a larger dataset and over time.

Question 21

Q

Explain the general concept of Ridge Regression.

Answer

Study These Flashcards

A

Ridge regression is like Linear Regression with the added constraint that coefficients are chosen so their magnitude is small. We want the coefficients of each feature to be as close to zero as possible while still predicting well.

-This is an example of regularization, or explicitly restricting a model to avoid overfitting. (Specifically this is L2 regularization).

Alpha is the parameter that is manipulated to determine the “penalty” applied to the coefficient
smaller alpha means less of a penalty for large magnitude. This means a small alpha will lead to basically linear regression.

Question 22

Q

Explain the general concept of Lasso Regression

Answer

Study These Flashcards

A

Lasso Regression is another example of regularization (specifically L1 regularization). It is like linear regression but with an added constraint but allows some coefficients to be exactly zero. This means that Lasso essentially can be used for feature selection (choosing the features that are important)

Again, lower alpha results closer to Linear Regression model.

Midterm-Yaseen Flashcards

(22 cards)