Final Flashcards
Given an input dataset and the Apriori algorithm, how to trace the algorithm for intermediate results? (Review)
How to derive strong rules from the given frequent itemsets L and a conf_rate?
Test for strong rules by filtering the rules with conf < min_conf.
How to improve the efficiency of the rule generation procedure by applying the apriori property?
Pruning while generating rules.
What are the two general purposes of DM, use some examples of mined association patterns to explain for each purpose?
- Find frequent itemsets
- find all frequent itemsets from D
- Generate association rules
- Dervice rules from each frequent itemset
How can the association mining process be mapped to the empirical cycle model of scientific research?
- Obersevation: observe all the data
- Analysis: generating all associations
- Theory: apply apriori knowledge (support / confidence)
- Prediction: predicting X given apriori knowledge
Why classification mining is a supervised learning process? How about association mining?
- Partitioning training data based on divide-and-conquer strategy.
- It trains on a portion of the data and then tests on the other portion. If the accuracy is acceptable it can be used to predict.
What are the major phases of conducting a classification mining application?
-
Training
- Each tuple is assumed to belong to a predefined class, as determined by the class label attribute
- The set of tuples used for model construction is training set
-
Testing
- The known label of test sample is compared with the classified result from the model
- Accuracy rate is the percentage of test set samples that are correctly classified by the model
-
Predicting
- If the accuracy is acceptable, use the model to classify unseen data
Can you describe a mapping between a classification application process and the empirical cycle?
- Analysis –> Classification algorithm
- Theory –> Classification Model
- Prediction –> Testing & Prediction
- Observation –> Training Data
What is the general idea/strategy/method/alorithm of DT induction for classification mining?
-
Supervised learning
- Derive a model form a training data set
-
Inductive learning process
- Contructing a tree using to-down, divide-and-conquer strategy
- Testing -> Choose -> Split
-
Tree constuction/induction by greedy search
- Depth-first search
- Heuristic function
What is the general strategy of Inductive Learning (via observing examples)?
- Divide and conquer strategy
- Continue dividing D into subsets, based a search method, until each subset has only one label, i.e. all examples in the subset share a same class label.
What are the major technical issues of DT Induction approach for classification mining?
- Preparing datasets: (training & testing)
- A training dataset for learning a model
- A test dataset for evaluating the learned model
- Classification model discovery: (constructing a DT)
- Stopping criteria for testing at each node
- How to choose which attribute to split, and how to split (method)
- Control structure for tree construction (recursive process)
- Pruning method
What is the heuristic function used in ID3 algorithm for evaluating search directions?
Entropy Calculation
What is the notion of Information Gain, and how it is applied in ID3 algorithm?
- Expected reduction in entropy
- Define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X
- ID3 uses information gain to select among the candidate attributes at each step while growing the tree
How to convert the ID3 algorithm into an implementation code structure?
How to quantify information contained in a message?
Q(message) = P2(outcome_after) - P1(outcome_before)
Q, the quantity of information contained in a message.
P1, the probability of outcome before receiving a message.
P2, the probability of outcome after receiving the message.
- Information in general always associates with a question and a message for answering the question
- Information measure is to quantify the outcome of some expectation from the message for answering the question
Suppose a missing cow has strayed into a pasture represented as an
8 x 8 array of “cells”.
Question: Where is the cow?
Outcome: the probability of findng the cow.
Answer 1: Nobody knows.
Answer 2: The cow is in cell (4, 7).
What is the information received?
Outcome1 (cow before) = 1/64
Outcome2 (cow after = 1
Information received = log2 P2 - log2 P1
= log2 (P2/P1)
= log2 (1 / (1/64)
= log2 (64)
= 6 bits
What is the message and information received formulas?
- Q(message) = P2(after) - P1(before)
-
Information received = log2 P2 - log2 P1
- log2(P2/P1)
How the concept of 1 can be applied to a classification method, such as ID3 algorithm?
[????]
What is entropy and information gain? How to use information gain for choosing an attribute?
- Purpose is to select the best attribute with the highest information gain
- Entropy measures information required to classify any arbitrary tuple
- Information gain is used to dtermine the best split