Basic Concept of Classification Flashcards

1
Q

given a collection of records, each record by characterized by a tuple (x,y), where x is the attribute set and y is the label set.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

6 CLASSIFICATION TECHNIQUES

A
  1. Decision Tree Based Methods
  2. Rule-based Methods
  3. Memory Based Reasoning
  4. Neural Network / Deep Learning
  5. Naive Bayes and Bayesian Belief Network
  6. Support Vector Machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DECISION TREE: INDUCTION:

A

Training set is being inducted to train a model (Learning Algorithm -> Learn Model),

the model will be able to form a decision tree,

and we can apply the model to deduct from the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

is a type of algorithm that uses attributes to split the data recursively, till each split contains only a single class.

A

Hunt’s Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 TYPES OF ATTRIBUTES

A
  1. Binary
  2. Nominal
  3. Ordinal
  4. Continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 TEST CONDITION FOR NOMINAL ATTRIBUTE

A
  • Multi-way Split – use as many partitions as distinct values.
  • Binary Split – divides values into two subsets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2 TEST CONDITION FOR ORDINAL ATTRIBUTE

A
  • Multi-way Split – use as many partitions as distinct values.
  • Binary Split – divides values into two subsets and preserve order property among attribute values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is an approach to getting the best split where nodes with homogenous class distribution are preferred.

A

Greedy Approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Formula for General Framework when finding the best split.

A

M0 is the value of the parent.

M12 is Node 1 * Node 2

M34 is Node 3 * Node 4

Gain = M0 - M12 VS M0 - M34

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3 WAYS TO MESURE NODE IMPURITY

A
  1. Gini Index
  2. Entropy
  3. Classification Error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

A

Gini Impurity / Index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

measures homogeneity of a node uncertainty of a random variable or information content of a message.

A

Entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

measures misclassification made by a node.

A

Classification Erro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 STOPPING CRITERIA FOR TREE INDUCTION

A
  1. Stop expanding the node when all the record belongs to the same class.
  2. Stop expanding a node when all the records in the node have the same attribute.
  3. Early Termination Criteria.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

4 ADVANTAGES OF DEVISION TREE BASED CLASSIFICATION

A
  1. Inexpensive to Construct
  2. Extremely Fast at Classifying Unknown Records
  3. Easy to Interpret for Small-Sized Trees
  4. Accuracy is Comparable to other classification techniques.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly