Book - Chapter 7 Analytical Theory Classification Flashcards

1
Q

What applications does classification appear in

A

Data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary task of a classifier

A

To assign class labels to new observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Are classification method supervised or unsupervised

A

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is another name for a decision tree

A

Prediction tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the input variable of a decision tree

A

Categorical or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In a decision tree structure what is a test point

A

A node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a node without further branches called

A

A leaf node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do leaf nodes return

A

They return class labels and, in some implementations, they return the probability scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two varieties of decision trees

A

Classification trees and regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are classification trees

A

They usually apply to output variables that are categorical for example often binary yes or no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are regression trees

A

They can apply to output variables that are numerical continuous, such as the predicted price of a consumer good or the likely heard a subscription will be purchased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the term branch mean in decision trees

A

Refers to the outcome of a decision and is visualised as a line connecting two Nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens if the decision is numerical

A

The greater than branch is usually placed on the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an internal node

A

Are the dissertation or test points. Each internal note refers to an input variable or an attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the top internal node called

A

The root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the depth of a node

A

Is the minimum number of steps required to reach the node from the root

17
Q

What are short trees also known as

A

Weak learners or base learners

18
Q

What’s on in ensemble Mefford

A

They use multiple predictive models to vote, and decisions can be made based on the combination of the votes

19
Q

Gave examples of ensemble methods

A

Random forest, bagging, and boasting

20
Q

What is the simplest short tree called

A

Decision stump

21
Q

At each split what does the decision tree algorithm do

A

It picks the most informative attribute out of the remaining attributes

22
Q

How is the most informative attribute determined

A

By measures such as entropy and information gain

23
Q

What does entropy measure

A

The impurity of an attribute

24
Q

What does information gain measure

A

The purity of an attribute

25
Q

When do you achieve maximum entropy

A

When all class labels are equally probable

26
Q

What is conditional entropy always

A

Less than or equal to the base Entropy

27
Q

What is information gain defined as

A

The difference between base Entropy and conditional entropy

28
Q

What is Bayes theorem

A

Gives a relationship between the probabilities of two events and their conditional probabilities

29
Q

What is a naive Bayes classifier

A

Assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of other features

30
Q

What are the input variables of naive Bayes

A

Categorical and I’ll discreet

31
Q

What is the output of naive Bayes

A

Class label and its corresponding probability score. The probability score is not the true probability of the class label, but it’s proportional to the true probability

32
Q

What is naive Bayes most commonly used for

A

Spam filtering

33
Q

What is Bayes theorem

A

The conditional probability of event C occurring, given that event A has already occurred, is to noted as P (C|A)

34
Q

What should a good classifier have

A

A large true positive and true negative and a small (ideally zero) numbers for false positives and false negatives

35
Q

What does accuracy mean

A

Defining the rate at which a model has classified the records correctly

36
Q

What is recall

A

The percentage of positive instances that were correctly identified