Classification - Part 1 Flashcards
What is the Goal of Classification
Previously unseen records should be assigned a class from a given set of classes as accurate as possible
Explain the Approach of Classification
- You have a training set consisting of
-attributes- a class attribute (label) with positive and negative
examples
- a class attribute (label) with positive and negative
- Class attribute should be predicted based
- Learn a model as a function of the values of the other attributes to predict the class attribute
Which variants of Classification exist
- Binary classification (fraud / no fraud)
- Multi-class classification (low, medium, high)
- Multi-label classification (more than one class per record e.g. user interests)
What are the steps in the Model Learning and Model Application Process?
Training Set -> (Induction) -> Learn Model -> Model
-> Apply Model -> Deduction -> Unseen records
Give some classification examples
- Credit Risk Assessment
- Attributes: age, income, debts
- Class: Credit (yes/no)
- Marketing:
- Attributes: previously bought products
- Class: Target Customer (yes/no)
- SPAM detection:
- Attributes: words and header fields of e-mail
- Class: Spam (yes/no)
List seven classification techniques
- K-nearest-neighbours
- Decision Trees
- Rule Learning
- Naive Bayes
- Support Vector Machines
- Artificial Neural Networks
- Deep Neural Networks
Explain the approach of K-nearest-neighbor
It requires:
- A set of stored records
- Distance measure
- k value (number of nearest neighbors to consider)
Approach:
For each unknown record:
1. Compute distance to each training record
2. Identify k-nearest neighbors
3. Use class labels of nearest neighbors to determine class of unknown record
- majority vote or
- weighting vote according to distance
What is a k-nearest neighbor ?
If you have a unknown record x the k-nearest neighbors are data points that have the k smallest distance to x
How to choose a good value for K?
Role of Thumb: Test k values between 1 and 20
If k too small: result is sensitive to noise points
If k is too large: neighborhood may include points from other classes
What are the pros and cons of k-nearest-neighbor classification
+ Often very accurate (for optical character recognition)
- but slow (unseen records are compared to all training examples)
Neutral aspects:
- Results depend on choosing a good proximity measure
- KNN can handle decision boundaries which are not parallel to the axes
Describe Lazy Learning
- Instance based learning approaches
- No explicit model is learned
- -> Less time to learn but long time to classify
Single Goal: Classify unseen records as accurately as possible
Example: KNN
Describe Eager Learning
- Eager learning generate models that might be interpretable by humans
- -> Longer time for learning but less time to classify
Compared to lazy learning, eager learning has two goals:
- classify unseen records
- understand the application domain as a human
Examples: decision tree learning, rule learning
What are the components of a decision tree classifier
- Root of tree
- Attribute tests (splitting attributes)
- Leaf nodes (decision - no tests / splits)
Why are the decision boundaries of a decision tree parallel to the axes?
Because the test condition involves a single attribute at-a-time
-> It is a step by step division of the space into areas parallel to the axes
How to learn a decision tree from training data?
- Training data with attributes and class attribute
- Apply Learning Algorithm
- Decision Tree Model
- Finding optimal decision tree is NP-hard
- Algorithms use greedy, top-down, recursive partitioning to induce a reasonable solution
Example Algorithms
- Hunts Algorithm
- ID3
- CHAID
Explain Hunt’s algorithm (decision tree)
- You have a training set and need to generate either leaf nodes or attribute tests
Dt: set of training records that reach a node t
O1 ) If Dt only contains records of the same class then t is a leaf node. O2 ) If Dt contains records with more than one class use attribute test in order to split data into subsets with a higher purity
Recursively apply procedure to each subset
What are the design issues for learning decision trees?
- How should training records be split?
How to specify attribute test condition
- Depends on number of splits: Binary, Multi-way split
- Depends on attribute data type: nominal, ordinal, continuous
How to determine the best split
- Can use different purity measures - When should the splitting procedure stop?
- Shallow trees might generalize better to unseen records
- Fully grown trees might overfit training data
How can you split nominal attributes in decision trees
- Multi-way split: use as many partitions as distinct values
- Binary split: Divides values into two subsets