Next Deck Flashcards
a decision tree can be converted to a set of rules by mapping from
the root node to the leaf nodes one by one
P(x)
Predictor Prior Probability
in this method we assign each observative to its own cluster, then compute the similarity (distance) between each of the clusters and join the two most similar clusters, repeat until there is only a single cluster left
agglomerative method
P(x|c)
Likelihood
distance between two clusters is defined as the average distance between _ _ in one cluster to _ _ in the other cluster
average linkage, each point, every point
The _ (SSE) between the 2 clusters over all of the variables is used as the distance
error sum of squares, Ward’s method
Class prior probability
P(c)
ID3. If the sample is completely homogeneous the _
entropy is zero
in this method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters, finally we proceed recursively on each cluster until there is one cluster for each observation
divisive method
to build a decision tree we need to calculate two types of entropy using freq tables as follows. What is the first, and write the formula.
(a) entropy of the target
OneR Algorithm
For each predictor
For each value of that predictor, make a rule is follows
Count how often each class appears
Find the most frequent class
Make the rule assign that class to this value
Calculate the total error of the rules of each predictor
Choose the predictor with the smallest total error
:Fat friends can’t find many charitable characters:
constructing a decision tree is all about finding attribute that _
returns the highest information gain (the most homogeneous branches)
two types of hierarchical clustering
divisive, agglomerative
decision tree: step 1. calculate the entropy of the target
step 2. split on the different attributes & calc the entropy for each branch
step 3. add proportionally to get the _
step 4. _
The result is the _
total entropy for the split
subtract this from the entropy before the split
information gain or decrease in entropy
P(c|x)
Posterior Probability
K-means clustering: intends to partition n objects into k clusters in which _
each object belongs to the cluster with the nearest mean
Bayesian: the _ can be calculated first by constructing _
posterior probability, frequency table
Posterior Probability
P(c|x)
The ID3 algorithm is run recursively on the _ until all data _.
non-leaf branches, is classified
ID3. if the sample is equally divided is has:
entropy of one
decision tree: topmost node
root node
proximity matrix in clustering: the following methods differ in how the distance between each cluster is measured:
- Single Linkage
- Complete Linkage
- Average Linkage
- Minimum Variance (Ward’s Method)
- Centroid Method
Single clusters are most comforting
Decision tree: represents a classification or decision
Leaf Node