Midterm-Yaseen Flashcards

1
Q

What is supervised learning?

A
  • Goal is to make accurate predictions for new, never-before-seen data
  • we have input and output pairs to “learn” from

Examples:

  • k-Nearest Neighbors
  • Linear Models
  • Naive Bayes Classifiers
  • Decision Trees
  • Ensembles of Decision Tress
  • Kernelized Support Vector Machines
  • Neural Networks (Deep Learning)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the k-Nearest Neighbors algorithm.

A

-Utilizing the training dataset, find the nearest data points and classify according these data points.

Note:
For KNeighborsClassifier we use majority vote to determine classificaiton

For KnEighborsRegressor we use mean (R^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain what is meant by “underfitting.”

A

A model that cannot capture the variations present in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the concept of “overfitting.”

A

A model that focuses too much on the training data and is not able to generalize to the new data very well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When should we use Nearest Neighbors methods?

A
  • ideal for small datasets
  • good as a baseline
  • easy to explain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should we use Linear Models?

A
  • go-to as a first method
  • good for very large datasets
  • good for very high-dimensional data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When should we use Naive Bayes?

A
  • Only used for classification
  • faster than linear models
  • good for very large datasets and high-dimensional data

Disadvantage: often less accurate than linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of Decision Tree methods?

A
  • very fast
  • don’t need scaling of the data
  • can visualize and easily explained
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages of Random Forest?

A
  • Nearly always perform better than a single decision tree
  • very robust and powerful
  • Does NOT require scaling of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a disadvantage of Random Forest? (When should they not be used)

A

Not good for high-dimensional sparse data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Compare Gradient boosted decision Trees and Random Forests in terms of advantages.

A
  • Gradient forests are slightly more accurate than random forests
  • Gradient forests are slower to train than random forests
  • Gradient forests are faster to predict and smaller in memory than random forests
  • Gradient forests require more parameter tuning than random forests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the advantages of Support Vector Machines.

A
  • powerful for medium-sized datasets of features with similar meaning
  • Does require scaling of data
  • sensitive to parameter settings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the advantages of Neural Networks

A
  • Can build very complex models (particularly for large datasets)
  • Sensitive to scaling of the data and choice of parameters
  • large models require a long time to train
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two primary types of unsupervised learning?

A
  • unsupervised transformations: creating a new representation of data which might be easier for humans or other machine learning algorithms to understand compared to the original representation.
    (dimensionality reduction, topic extraction)
  • Clustering algorithms: partition data into distinct groups of similar items.
    (like clustering in supervised learning but we don’t have output to compare to)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the primary challenge in unsupervised learning?

A
  • no outcome to compare to (how well did we do? nobody knows!)
  • We must manually inspect the results to see how we did.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a common utilization of unsupervised algorithms?

A

Exploratory setting:

-useful to change representation of data to then use a supervised learning method

17
Q

Why is k-nearest neighbors algorithm is not often used in practice?

A

-slow prediction
-inability to handle many features
(although it is very easy to understand and reasonable performance without a lot of adjustments)

18
Q

What is k-nearest neighbors best utilized for?

A

-good baseline method before considering more advanced techniques

19
Q

Consider the following training and test set scores:
Training set score: 0.96
Test set score: 0.63
What is this a sign of?

A

The discrepancy is a sign of overfitting.

20
Q

Consider the two sets of training/test set score pairs for two different machine learning algorithms:

Model 1:
Training set score: 0.98
Test set score: 0.82

Model 2:
Training set score: 0.84
Test set score: 0.80

Which model would you choose and why?

A

Model 2.

Although Model 1 has a slightly higher Test set score than Model 2, there is a much higher discrepancy between the Training and Test scores in Model 1 which is a sign of overfitting.

In Model 2, we see a lower training set score and a much smaller discrepancy between the Training and Test set score.

I would expect Model 2 to generalize better than Model 1 in a larger dataset and over time.

21
Q

Explain the general concept of Ridge Regression.

A

Ridge regression is like Linear Regression with the added constraint that coefficients are chosen so their magnitude is small. We want the coefficients of each feature to be as close to zero as possible while still predicting well.

-This is an example of regularization, or explicitly restricting a model to avoid overfitting. (Specifically this is L2 regularization).

  • Alpha is the parameter that is manipulated to determine the “penalty” applied to the coefficient
  • smaller alpha means less of a penalty for large magnitude. This means a small alpha will lead to basically linear regression.
22
Q

Explain the general concept of Lasso Regression

A

Lasso Regression is another example of regularization (specifically L1 regularization). It is like linear regression but with an added constraint but allows some coefficients to be exactly zero. This means that Lasso essentially can be used for feature selection (choosing the features that are important)

Again, lower alpha results closer to Linear Regression model.