Classification: Basic Concepts Flashcards

1
Q

What is classification?

A

model or classifier is constructed to predict class (categorical) labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a numeric prediction?

A

model constructed predicts a continuous-valued function, or ordered value, as opposed to a class label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is regression analysis?

A

statistical methodology that is most often used for numeric prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two major types of prediction problems?

A

Classification and numeric predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two steps in data classification?

A

learning step and classification step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the learning step

A

The training phase where classification algorithm builds teh classifier by analyzing or learning form a training set with the associated class labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is classification step?

A

model used to predict class label for a given data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the accuracy of a cllassifier?

A

In a given test set, is the percentage of test set tuples that are correctly classified by the classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is desicion tree induction?

A

learning of decision trees from class-labeled training tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a desicion tree?

A

flowchart-like tree structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does each internal node (non leaf node) denotes?

A

a test on an attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does each branch in a desicion tree represent?

A

outcome of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does each leaf node represent?

A

terminal node holds the class label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are decision trees used for classification?”

A

Given a tuple, X, for which the associated class label is unknown, the attribute values of the tuple are tested against the decision tree. A path is traced from the root to a leaf node, which holds the class prediction for that tuple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we do attribute selection measures?

A

used to select the attribute that bests partitions the tuples into distinct classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an attribute selection method?

A

heuristic procedure for selecting the attribute that bests discriminates the given tuple according to class

17
Q

What are two examples of attribute_selection_methods?

A

ifnormation gain and gini index

18
Q

Does gini index enforce non binary or binary trees?

A

binary

19
Q

What is the other name ofr attribute selection methods?

A

splitting rules

20
Q

What is the bias in information gain?

A

bias towards multivariate attributes

21
Q

What is the bias in gain ratio?

A

prefers unbalanced splits in which one partition is much smaller than the others

22
Q

What is the gini index bias towards?

A

bias towards multivariate attributes and it favors tests that result in equal sized partitions and purity in both partitions

23
Q

When does the gini index has difficulti with?

A

when the # of classes is large

24
Q

What is the best attribute selection method in regards to not being biased for multivariate attributes?

A

Minimum description length (MDL)

25
Q

what does MDL stand for? What does it do?

A

Minimum description length uses encoding techniques to define best desicion tree as the one that requires the fewest number of bits to both encode the tree and encode the exceptions to the tree. basically simplest solution is the best.

26
Q

what is a multivariate split?

A

form of attribute (Feature) construction where new attributes are built based on existing ones

27
Q

What does tree prunning does?

A

addresses the problem of overfitting the data in desicion trees

28
Q

What types of measures do tree prunning uses?

A

statitstical measures to remove the least-reliable branches

29
Q

What are the two most common tree prunning approaches?

A

prepruning and postprunning

30
Q

How does preprunning work?

A

tree is pruned by halting construction early. upon halting node becomes leaf and the leaf may hold the most frequent class among the subset tuples or the probability distribution of those tuples

31
Q

How does postprunning work?

A

removes subtrees from fully grown trees. removes branches and replaces it with a leaf which is labeled with the most frequent class among the subtree being replaced

32
Q

What is the cost complexity?

A

prunning algorithm that considers the cost complexity of a tree to be a function of the number of leaves in the tree and the error rate of the tree

33
Q

What is the error rate?

A

percentage of tuples misclassified