fa3 + logistic reg to gradient boosting Flashcards

1
Q

We can visualize the tree using the export_graph function from the tree module.

Group of answer choices:
True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the decision tree, the region can be found by traversing the tree from the root and going left or right.

Group of answer choices
True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision tree is a model that learns a hierarchy of if/else questions, leading to a decision.

Group of answer choices
True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The .dot file format is a _____ file format for storing graphs.

A

TEXT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the decision tree, the ______ represents the whole dataset.

Group of answer choices
Terminal Nodes
Edges
Root
Conditions

A

Root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The .dot file format is an image file format for storing graphs.
Group of answer choices
True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision trees in scikit learn are implemented in ________ and DecisionTreeClassifier classes.

Group of answer choices: DecisionRegressorTree
TreeDecisionRegressor
RegressorDecisionTree
DecisionTreeRegressor

A

DecisionTreeRegressor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which is not true about Random Forest?

Group of answer choices
Not in the options
Less memory usage.
Less burden or parameter tuning.
As many trees are created, detailed analysis is difficult.
Poor performance for large and sparse data.

A

Less memory usage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To build a random forest model, you need to decide on the __________ to build.

Group of answer choices
Depth of the tree Height of tree
Number of trees
Root
Node of the tree

A

Number of trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The _______ are methods that combine multiple machine learning models to create more powerful models.

A

ENSEMBLES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In the decision tree, the terminal nodes represent the whole dataset.

Group of answer choices
True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the decision tree, the sequence of if/else questions are called qualifiers.

Group of answer choices
True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which is not true about Random Forest?

Group of answer choices
Reduces underfitting by averaging trees that predict well.
Reduces overfitting by averaging trees that predict well.
Selects candidate features at random when splitting nodes.
Randomly selects some of the data when creating a tree.

A

Reduces underfitting by averaging trees that predict well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the parameters for Gradient Boosting?

a. n_estimators, learning rate
b. n_estimators, max_features
c. n_estimators, learning rate, max_depth
d. n_estimators, max_features, max_depth

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gradient boosting is used when you need to take more performance in random forests.

Group of answer choices
True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the decision tree, the sequence of if/else questions are called ______.

Group of answer choices
Qualifiers
Condition
Tests
Nodes

A

Tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Decision trees in scikit learn are implemented in DecisionTreeRegressor and _______ classes.

Group of answer choices
DecisionClassifier
TreeDecisionClassifier
DecisionTreeClassifier
DecisionClassifierTree

A

DecisionTreeClassifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

We can visualize the tree using the ______ function from the tree module.

A

export_graphviz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two most common linear classification algorithms:

A

Logistic Regression
Linear Support Vector Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Logistic Regression, implemented in where

A

linear_model.LogisticRegression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Linear Support Vector Machines (Linear SVMs), implemented in where

A

svm.LinearSVC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SVC stands for?

A

support vector classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

______ is a classification algorithm and not a regression algorithm, and it should not be confused with LinearRegression

A

LogisticRegression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

the trade-off parameter detemrins the strength of the regularizaiton, called _____

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Higher values of C correspond to _____

A

LESS REGULARIZATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When you use a high value of the parameter C, LogisticRegression and LinearSVC will _______

A

try to fit the training set as best as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

low values of the parameter C, the models put more emphasis on _______

A

finding a coefficient vector (w) that is close to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Using low values of C will cause the algorithms to try to adjust to the _____ of data points

A

“majority”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

using a higher value of C stresses the importance that each ______ be classified correctly

A

individual data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

_______ are a family of classifiers that are quite similar to the linear models

A

Naive Bayes classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

In Naive Bayes, ____is faster than linear classifier

A

Training Speeds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

In Naive Bayes, _____ performance is slightly lower

A

Generalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The reason that Naive Bayes models are so efficient is that they______ and collect simple per-class statistics from each feature

A

learn parameters by looking at each feature individually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

The reason that Naive Bayes models are so efficient is that they learn parameters by looking at each feature individually and _______

A

collect simple per-class statistics from each feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

3 Kinds of Naive Bayes Classifier in Scikit-learn:

A

GaussianNB
BernoulliNB
MultinomialNB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

GuassianNB -> ____ data

A

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

BernoulliNB -> ____ data, ___ data

A

Binary data, Text data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

MultinomialNB -> ____ data, ___ data

A

Integer count data, text data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

In Naive Bayes, it controls _____

A

model complexity with alpha parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

In Naives Bayes, _____ by adding virtually positive data as much as alpha

A

Smooth statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

In Naive Bayes, ____ decreases the complexity of the model but does not change the performance

A

Large alpha

42
Q

_____ is a high-dimensional dataset

A

GaussianNB

43
Q

_____ and ______ are a text-like used to count sparse data

A

BernioulliNB and MultinomialNB

44
Q

In Naive Bayes, _____ are fast and easy to understand and process

A

Training and testing

45
Q

Naives Bayes works well with _____ and is not _____

A

sparse high-dimensional datasets, parameter sensitive

46
Q

______ are widely used models for classification and regression tasks

A

Decision trees

47
Q

In Decision Trees, they learn a hierarchy of ____, leading to a decision

A

if/else questions

48
Q

Learning a _____ means learning the sequence of if/else questions that gets us to the true answer most quickly

A

decision tree

49
Q

In the machine learning setting, if/else questions are called ___

A

tests

50
Q

To build a tree, the algorithm searches over all possible tests and finds the one that is ____ about the target variable

A

most informative

51
Q

The top node is called the ___, representing the whole dataset.

A

root

52
Q

Parts of a decision tree:

A

Root Node
Node
Edge (Connects tests to other nodes)
Terminal Node (Nodes with no futher edges)
Characteristics (inside nodes)

53
Q

A prediction on a new data is made by checking which region of the ____ the point lies in, and then predicting the majority target (or the single target in the case of pure leaves) in that region

A

partition of the feature space

54
Q

The ____ can be found by traversing the tree from the root and going left or right, depending on whether the test is fulfilled or not

A

region

55
Q

Decision trees in scikit learn are implemented in ____ and ____ classes

A

DecisionTreeRegressor, DecisionTreeClassifier

56
Q

We can visualize the tree using the ___ function from the tree module

A

export_graphviz

57
Q

export_graphviz writes a file in the ____, which is a text file format for storing graphs

A

.dot file format

58
Q

export_graphviz writes a file in the .dot file format, which is a ____for storing graphs

A

text file format

59
Q

We can visualize the _____ in a way that is similar to the way we visualize the coefficients in the linear model

A

feature importances

60
Q

______ is impossible for extrapolation predicting outside the range of training data

A

Extrapolation 0

61
Q

____ is not affected by scale

A

Decision Tree Regression

62
Q

_____ are methods that combine multiple machine learning models to create more powerful models

A

Ensembles

63
Q

Two ensemble models that have proven to be effective on a wide range of datasets, for classification and regression, both of which use decision trees as their building blocks:

A

Random Forests
Gradient Boosted Decision Trees

64
Q

It is one of the ensemble methods that can avoid overfitting by combining multiple decision trees

A

Random Forests

65
Q

Random Forests Reduces overfitting by ______

A

averaging trees that predict well

66
Q

In Random Forests, Regression is:

A

average of predicted values

67
Q

In Random Forests, Classification is:

A

average of predicted probabilities

68
Q

It injects randomness when creating trees

A

Random Forests

69
Q

In Random Forests, it randomly selects _____ when creating a tree

A

some of the data

70
Q

In Random Features, selects ______ when splitting nodes

A

candidate features at random

71
Q

To build a random forest model, you need to decide on the _____ to build

A

number of trees

72
Q

To build a random forest model, you need to decide on the number of trees to build (the ____ parameter of RandomForestRegressor or RandomForestClassifier)

A

n_estimators

73
Q

To build a tree, we first take what is called a _____ of our data. That is, from our n_samples data points, we repeatedly draw an example randomly with replacement

A

bootstrap sample

74
Q

A critical parameter in this process is ____. If we set _____ to n features, that means that each split can look at all features in the dataset, and no randomness will be injected in the feature selection

A

max features

75
Q

The Advantages of Random Forests are:

A

Mostly widely used algorithm in regression and classification
Excellent performance, less burden or parameter tuning, no data scale required
Large datasets can be applied

76
Q

In Random Forests, it is the _______ algorithm in regression and classification

A

Mostly widely used

77
Q

In Random Forests, it has ___ performance, less burden or _____, ____ required

A

excellent performance, parameter tuning, no data scale required

78
Q

In Random Forests, ____ datasets can be applied

A

Large

79
Q

The Disadvantages of Random Forests are:

A

As many trees are created, detailed analysis is difficult, and the trees tend to get deeper
Poor performance for large and sparse data
More memory usage and slower training and prediction than linear models

80
Q

In Random Forests, as many trees are created, _____ is difficult and the trees tends to get deeper

A

detailed analysis

81
Q

In Random Forests, it has poor performance for ___ and ____ data

A

large and sparse data

82
Q

In Random Forests, it has more ____ and slower ___ and ____ than linear models

A

memory usage
slower training and prediction

83
Q

The parameters used in Random Forests are:

A

n_estimators, max features

84
Q

Another ensemble algorithm based on DecisiontreeRegressor

A

Gradient Boosted Regression Trees (gradient boosting machines)

85
Q

Gradient Boosted Regression Trees can be used for both ____ and ___

A

classification and regression

86
Q

In Gradient Boosted Regression Trees, unlike random forest, _____ is strongly applied instead of randomness

A

pre-pruning

87
Q

Used a lot in machine learning contests (Kaggle)

A

Gradient Boosted Regression Trees (gradient boosting machines)

88
Q

In Gradient Boosted Regression Trees, it is slightly more _____, slightly ____ than random forest

A

more parameter sensitive
higher performance

89
Q

Create the next tree to compensate for the error of the previous tree using a ______

A

shallow tree of 5 or less

90
Q

In Gradient Boosted Regression Trees, the regression is:

A

least squares error loss function

91
Q

In Gradient Boosted Regression Trees, the classification is:

A

logistic loss function

92
Q

In Gradient Boosted Regression Trees, use _____ method

A

gradient descent method

93
Q

In Gradient Boosted Regression Trees, use gradient descent method (learning rate parameter is important default = ___)

A

0.1

94
Q

Gradient Boosting Advantages:

A

Use when you need to take more performance in random forests (xgboost for larger scales)
No need for feature scale adjustment and can be used for binary and continuous features

95
Q

In Gradient Boosting, we can use it when you need to take more _____ in random forests (____ for larger scales)

A

performance, xgboost

96
Q

In Gradient Boosting, no need for _________ and can be used for binary and continuous features

A

feature scale adjustment

97
Q

Gradient Boosting Disadvantages:

A

Doesn’t work well for sparse high-dimensional data
Sensitive to parameters, takes longer training time

98
Q

In Gradient Boosting, it doesn’t work well for sparse ___

A

high-dimensional data

99
Q

In Gradient Boosting, it is sensitive to _____, takes longer _____

A

Parameters
Longer Training time

100
Q

Gradient Boosting Parameters are:

A

n_estimators
learning rate
max_depth (<=5)