Book - Chapter 7 Analytical Theory Classification Flashcards

Question 1

Q

What applications does classification appear in

Answer

A

Data mining

Question 2

Q

What is the primary task of a classifier

Answer

A

To assign class labels to new observations

Question 3

Q

Are classification method supervised or unsupervised

Answer

A

Supervised

Question 4

Q

What is another name for a decision tree

Answer

A

Prediction tree

Question 5

Q

What is the input variable of a decision tree

Answer

A

Categorical or continuous

Question 6

Q

In a decision tree structure what is a test point

Question 7

Q

What is a node without further branches called

Answer

A

A leaf node

Question 8

Q

What do leaf nodes return

Answer

A

They return class labels and, in some implementations, they return the probability scores

Question 9

Q

What are the two varieties of decision trees

Answer

A

Classification trees and regression trees

Question 10

Q

What are classification trees

Answer

A

They usually apply to output variables that are categorical for example often binary yes or no

Question 11

Q

What are regression trees

Answer

A

They can apply to output variables that are numerical continuous, such as the predicted price of a consumer good or the likely heard a subscription will be purchased

Question 12

Q

What does the term branch mean in decision trees

Answer

A

Refers to the outcome of a decision and is visualised as a line connecting two Nodes

Question 13

Q

What happens if the decision is numerical

Answer

A

The greater than branch is usually placed on the right

Question 14

Q

What is an internal node

Answer

A

Are the dissertation or test points. Each internal note refers to an input variable or an attribute

Question 15

Q

What is the top internal node called

Question 16

Q

What is the depth of a node

Answer

A

Is the minimum number of steps required to reach the node from the root

Question 17

Q

What are short trees also known as

Answer

A

Weak learners or base learners

Question 18

Q

What’s on in ensemble Mefford

Answer

A

They use multiple predictive models to vote, and decisions can be made based on the combination of the votes

Question 19

Q

Gave examples of ensemble methods

Answer

A

Random forest, bagging, and boasting

Question 20

Q

What is the simplest short tree called

Answer

A

Decision stump

Question 21

Q

At each split what does the decision tree algorithm do

Answer

A

It picks the most informative attribute out of the remaining attributes

Question 22

Q

How is the most informative attribute determined

Answer

A

By measures such as entropy and information gain

Question 23

Q

What does entropy measure

Answer

A

The impurity of an attribute

Question 24

Q

What does information gain measure

Answer

A

The purity of an attribute

Question 25

Q

When do you achieve maximum entropy

Answer

A

When all class labels are equally probable

Question 26

Q

What is conditional entropy always

Answer

A

Less than or equal to the base Entropy

Question 27

Q

What is information gain defined as

Answer

A

The difference between base Entropy and conditional entropy

Question 28

Q

What is Bayes theorem

Answer

A

Gives a relationship between the probabilities of two events and their conditional probabilities

Question 29

Q

What is a naive Bayes classifier

Answer

A

Assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of other features

Question 30

Q

What are the input variables of naive Bayes

Answer

A

Categorical and I’ll discreet

Question 31

Q

What is the output of naive Bayes

Answer

A

Class label and its corresponding probability score. The probability score is not the true probability of the class label, but it’s proportional to the true probability

Question 32

Q

What is naive Bayes most commonly used for

Answer

A

Spam filtering

Question 33

Q

What is Bayes theorem

Answer

A

The conditional probability of event C occurring, given that event A has already occurred, is to noted as P (C|A)

Question 34

Q

What should a good classifier have

Answer

A

A large true positive and true negative and a small (ideally zero) numbers for false positives and false negatives

Question 35

Q

What does accuracy mean

Answer

A

Defining the rate at which a model has classified the records correctly

Question 36

Q

What is recall

Answer

A

The percentage of positive instances that were correctly identified