Next Deck Flashcards by Nigel Stafford

a decision tree can be converted to a set of rules by mapping from

the root node to the leaf nodes one by one

How well did you know this?

Not at all

Perfectly

P(x)

Predictor Prior Probability

How well did you know this?

Not at all

Perfectly

in this method we assign each observative to its own cluster, then compute the similarity (distance) between each of the clusters and join the two most similar clusters, repeat until there is only a single cluster left

agglomerative method

How well did you know this?

Not at all

Perfectly

P(x|c)

Likelihood

How well did you know this?

Not at all

Perfectly

distance between two clusters is defined as the average distance between _ _ in one cluster to _ _ in the other cluster

average linkage, each point, every point

How well did you know this?

Not at all

Perfectly

The _ (SSE) between the 2 clusters over all of the variables is used as the distance

error sum of squares, Ward’s method

How well did you know this?

Not at all

Perfectly

Class prior probability

P(c)

How well did you know this?

Not at all

Perfectly

ID3. If the sample is completely homogeneous the _

entropy is zero

How well did you know this?

Not at all

Perfectly

in this method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters, finally we proceed recursively on each cluster until there is one cluster for each observation

divisive method

How well did you know this?

Not at all

Perfectly

to build a decision tree we need to calculate two types of entropy using freq tables as follows. What is the first, and write the formula.

(a) entropy of the target

How well did you know this?

Not at all

Perfectly

OneR Algorithm

For each predictor

For each value of that predictor, make a rule is follows

Count how often each class appears

Find the most frequent class

Make the rule assign that class to this value

Calculate the total error of the rules of each predictor

Choose the predictor with the smallest total error

:Fat friends can’t find many charitable characters:

How well did you know this?

Not at all

Perfectly

constructing a decision tree is all about finding attribute that _

returns the highest information gain (the most homogeneous branches)

How well did you know this?

Not at all

Perfectly

two types of hierarchical clustering

divisive, agglomerative

How well did you know this?

Not at all

Perfectly

decision tree: step 1. calculate the entropy of the target

step 2. split on the different attributes & calc the entropy for each branch

step 3. add proportionally to get the _

step 4. _

The result is the _

total entropy for the split

subtract this from the entropy before the split

information gain or decrease in entropy

How well did you know this?

Not at all

Perfectly

P(c|x)

Posterior Probability

How well did you know this?

Not at all

Perfectly

K-means clustering: intends to partition n objects into k clusters in which _

each object belongs to the cluster with the nearest mean

How well did you know this?

Not at all

Perfectly

Bayesian: the _ can be calculated first by constructing _

posterior probability, frequency table

How well did you know this?

Not at all

Perfectly

Posterior Probability

P(c|x)

How well did you know this?

Not at all

Perfectly

The ID3 algorithm is run recursively on the _ until all data _.

non-leaf branches, is classified

How well did you know this?

Not at all

Perfectly

ID3. if the sample is equally divided is has:

entropy of one

How well did you know this?

Not at all

Perfectly

decision tree: topmost node

root node

How well did you know this?

Not at all

Perfectly

proximity matrix in clustering: the following methods differ in how the distance between each cluster is measured:

Single Linkage
Complete Linkage
Average Linkage
Minimum Variance (Ward’s Method)
Centroid Method

Single clusters are most comforting

How well did you know this?

Not at all

Perfectly

Decision tree: represents a classification or decision

Leaf Node

How well did you know this?

Not at all

Perfectly

ID3 uses _ & _ to construct a decision tree

Study These Flashcards

entropy, information gain

K-means clusteromg produces _ different clusters. The best number of clusters leading to the greatest seperation is not known as a priori and must be computed from the data.

the object of k-means clustering is to minimize the \_

total intra-cluster variance or the squared error function

decision tree: step 1. calculate the entropy of the target step 2. split on the different attributes & \_

calculate the entropy for each branch

Decision tree: has two or more branches

decision node

Likelihood

P(x|c)

distance between two clusters is defined as the shortest distance beween two points in each cluster

single linkage hierarchical clustering

The information gain is based on the _ ater a dataset is split on an attribute

decrease in entropy

distance between two clusters is defined as the longest distance between two points in each cluster

complete linkage

P(c|x) =

P(x|c)P(c) \_\_\_\_\_\_\_\_ P(x)

Draw the likelihood table and work out the posterior probability

answer.

frequency tables for numerical variables, first option

binning

the _ is based on Bayes theorem with

naive bayesian classified, independence assumptions between predictors

Gain(T,X) =

Entropy(T) - Entropy(T,X)

Bayesian step (3)

the class with the highest posterior probability is the outcome of prediction

before any clustering is performed it is required to determine the \_

proximity matrix containing the distance between each point using a distance function

One Rule classification algorithm

OneR

the probability density function for the normal distribution is defined by two parameters

mean and standard deviation

P(c)

Class Prior Probability

decision tree core algorithm by JR Quinlan \_, uses a \_, with no \_

ID3, greedy search, no backtracking

OneR, a low total error means a

higher contribution to the predictability of the model

OneR generates one rule for:

each predictor in the data then select the rule with the smallest total error as it's one rule

decision tree: step 1. calculate the entropy of the \_

target

One Rule, to create a rule, \_

contruct a frequency table for each predictor against the targer

Information gain =

gain(T,X) = entropy(T) - entropy(T,X)

ID3 uses _ to calculate the \_

entropy, homogeneity of a sample

Draw a freq table for Outlook: Sunny/Overcase/Rainy, Play golf: yes/no

the center of the group of objects (the centroid) is used to determine the average distance between clusters of objects

Centroid method

Adding 1 to all the counts when an attribute value doesn't occur with every class value

the zero frequency problem

Work out the entropy of the target: * play golf yes = 9 * play gold no = 5

Entropy(5,9) = Entropy(0.36,0.64) =-(0.36 \* log2 0.36) - (0.64 log2 0.64) = 0.94

wowk out the likelihood of yes and no, and the probability

answer0.014109

Predictor Prior Probability

P(x)

K-means clustering algorithm

1. cluster the data into k groups where k is predefined. 2. randomly select k points as cluster centers 3. assign objects to their closest cluster (euclidean) 4. centroid calculation (or mean) 5. repeat until cluster asignments unchanged in rounds Clustering: randomly assigning centroids & repeating

Bayesian step 2. Transform the freq tables into _ and calculate the \_

likelihood tables, posterior probability for each class

A branch with what entropy needs no more splitting

zero

Creating clusters that have a predetermined order from top to bottom

hierarchical clustering

Next Deck Flashcards

(59 cards)