Next Deck Flashcards
a decision tree can be converted to a set of rules by mapping from
the root node to the leaf nodes one by one
P(x)
Predictor Prior Probability

in this method we assign each observative to its own cluster, then compute the similarity (distance) between each of the clusters and join the two most similar clusters, repeat until there is only a single cluster left
agglomerative method
P(x|c)
Likelihood
distance between two clusters is defined as the average distance between _ _ in one cluster to _ _ in the other cluster
average linkage, each point, every point
The _ (SSE) between the 2 clusters over all of the variables is used as the distance
error sum of squares, Ward’s method
Class prior probability
P(c)

ID3. If the sample is completely homogeneous the _
entropy is zero
in this method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters, finally we proceed recursively on each cluster until there is one cluster for each observation
divisive method
to build a decision tree we need to calculate two types of entropy using freq tables as follows. What is the first, and write the formula.
(a) entropy of the target

OneR Algorithm
For each predictor
For each value of that predictor, make a rule is follows
Count how often each class appears
Find the most frequent class
Make the rule assign that class to this value
Calculate the total error of the rules of each predictor
Choose the predictor with the smallest total error
:Fat friends can’t find many charitable characters:

constructing a decision tree is all about finding attribute that _
returns the highest information gain (the most homogeneous branches)
two types of hierarchical clustering
divisive, agglomerative
decision tree: step 1. calculate the entropy of the target
step 2. split on the different attributes & calc the entropy for each branch
step 3. add proportionally to get the _
step 4. _
The result is the _
total entropy for the split
subtract this from the entropy before the split
information gain or decrease in entropy
P(c|x)
Posterior Probability
K-means clustering: intends to partition n objects into k clusters in which _
each object belongs to the cluster with the nearest mean
Bayesian: the _ can be calculated first by constructing _
posterior probability, frequency table

Posterior Probability
P(c|x)
The ID3 algorithm is run recursively on the _ until all data _.
non-leaf branches, is classified
ID3. if the sample is equally divided is has:
entropy of one
decision tree: topmost node
root node
proximity matrix in clustering: the following methods differ in how the distance between each cluster is measured:
- Single Linkage
- Complete Linkage
- Average Linkage
- Minimum Variance (Ward’s Method)
- Centroid Method
Single clusters are most comforting
Decision tree: represents a classification or decision
Leaf Node
ID3 uses _ & _ to construct a decision tree
entropy, information gain
K-means clusteromg produces _ different clusters. The best number of clusters leading to the greatest seperation is not known as a priori and must be computed from the data.
k

the object of k-means clustering is to minimize the _
total intra-cluster variance or the squared error function

decision tree: step 1. calculate the entropy of the target
step 2. split on the different attributes & _
calculate the entropy for each branch
Decision tree: has two or more branches
decision node
Likelihood
P(x|c)
distance between two clusters is defined as the shortest distance beween two points in each cluster
single linkage hierarchical clustering
The information gain is based on the _ ater a dataset is split on an attribute
decrease in entropy
distance between two clusters is defined as the longest distance between two points in each cluster
complete linkage
P(c|x) =
P(x|c)P(c)
________
P(x)
Draw the likelihood table and work out the posterior probability

answer.

frequency tables for numerical variables, first option
binning
the _ is based on Bayes theorem with
naive bayesian classified, independence assumptions between predictors
Gain(T,X) =
Entropy(T) - Entropy(T,X)
Bayesian step (3)
the class with the highest posterior probability is the outcome of prediction
before any clustering is performed it is required to determine the _
proximity matrix containing the distance between each point using a distance function
One Rule classification algorithm
OneR
the probability density function for the normal distribution is defined by two parameters
mean and standard deviation
P(c)
Class Prior Probability
decision tree core algorithm by JR Quinlan _, uses a _, with no _
ID3, greedy search, no backtracking
OneR, a low total error means a
higher contribution to the predictability of the model
OneR generates one rule for:
each predictor in the data then select the rule with the smallest total error as it’s one rule
decision tree: step 1. calculate the entropy of the _
target
One Rule, to create a rule, _
contruct a frequency table for each predictor against the targer

Information gain =
gain(T,X) = entropy(T) - entropy(T,X)
ID3 uses _ to calculate the _
entropy, homogeneity of a sample
Draw a freq table for Outlook: Sunny/Overcase/Rainy,
Play golf: yes/no

the center of the group of objects (the centroid) is used to determine the average distance between clusters of objects
Centroid method

Adding 1 to all the counts when an attribute value doesn’t occur with every class value
the zero frequency problem
Work out the entropy of the target:
- play golf yes = 9
- play gold no = 5
Entropy(5,9)
= Entropy(0.36,0.64)
=-(0.36 * log2 0.36) - (0.64 log2 0.64)
= 0.94
wowk out the likelihood of yes and no, and the probability
answer0.014109
Predictor Prior Probability
P(x)

K-means clustering algorithm
- cluster the data into k groups where k is predefined.
- randomly select k points as cluster centers
- assign objects to their closest cluster (euclidean)
- centroid calculation (or mean)
- repeat until cluster asignments unchanged in rounds
Clustering: randomly assigning centroids & repeating
Bayesian step 2. Transform the freq tables into _ and calculate the _
likelihood tables, posterior probability for each class

A branch with what entropy needs no more splitting
zero
Creating clusters that have a predetermined order from top to bottom
hierarchical clustering