Next Deck Flashcards

1
Q

a decision tree can be converted to a set of rules by mapping from

A

the root node to the leaf nodes one by one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

P(x)

A

Predictor Prior Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

in this method we assign each observative to its own cluster, then compute the similarity (distance) between each of the clusters and join the two most similar clusters, repeat until there is only a single cluster left

A

agglomerative method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

P(x|c)

A

Likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

distance between two clusters is defined as the average distance between _ _ in one cluster to _ _ in the other cluster

A

average linkage, each point, every point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The _ (SSE) between the 2 clusters over all of the variables is used as the distance

A

error sum of squares, Ward’s method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Class prior probability

A

P(c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ID3. If the sample is completely homogeneous the _

A

entropy is zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in this method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters, finally we proceed recursively on each cluster until there is one cluster for each observation

A

divisive method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

to build a decision tree we need to calculate two types of entropy using freq tables as follows. What is the first, and write the formula.

A

(a) entropy of the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OneR Algorithm

A

For each predictor

For each value of that predictor, make a rule is follows

Count how often each class appears

Find the most frequent class

Make the rule assign that class to this value

Calculate the total error of the rules of each predictor

Choose the predictor with the smallest total error

:Fat friends can’t find many charitable characters:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

constructing a decision tree is all about finding attribute that _

A

returns the highest information gain (the most homogeneous branches)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

two types of hierarchical clustering

A

divisive, agglomerative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

decision tree: step 1. calculate the entropy of the target

step 2. split on the different attributes & calc the entropy for each branch

step 3. add proportionally to get the _

step 4. _

The result is the _

A

total entropy for the split

subtract this from the entropy before the split

information gain or decrease in entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P(c|x)

A

Posterior Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-means clustering: intends to partition n objects into k clusters in which _

A

each object belongs to the cluster with the nearest mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bayesian: the _ can be calculated first by constructing _

A

posterior probability, frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Posterior Probability

A

P(c|x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The ID3 algorithm is run recursively on the _ until all data _.

A

non-leaf branches, is classified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ID3. if the sample is equally divided is has:

A

entropy of one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

decision tree: topmost node

A

root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

proximity matrix in clustering: the following methods differ in how the distance between each cluster is measured:

A
  • Single Linkage
  • Complete Linkage
  • Average Linkage
  • Minimum Variance (Ward’s Method)
  • Centroid Method

Single clusters are most comforting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Decision tree: represents a classification or decision

A

Leaf Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ID3 uses _ & _ to construct a decision tree

A

entropy, information gain

19
K-means clusteromg produces _ different clusters. The best number of clusters leading to the greatest seperation is not known as a priori and must be computed from the data.
k
19
the object of k-means clustering is to minimize the \_
total intra-cluster variance or the squared error function
19
decision tree: step 1. calculate the entropy of the target step 2. split on the different attributes & \_
calculate the entropy for each branch
21
Decision tree: has two or more branches
decision node
22
Likelihood
P(x|c)
22
distance between two clusters is defined as the shortest distance beween two points in each cluster
single linkage hierarchical clustering
23
The information gain is based on the _ ater a dataset is split on an attribute
decrease in entropy
24
distance between two clusters is defined as the longest distance between two points in each cluster
complete linkage
26
P(c|x) =
P(x|c)P(c) \_\_\_\_\_\_\_\_ P(x)
28
Draw the likelihood table and work out the posterior probability
answer.
29
frequency tables for numerical variables, first option
binning
30
the _ is based on Bayes theorem with
naive bayesian classified, independence assumptions between predictors
31
Gain(T,X) =
Entropy(T) - Entropy(T,X)
32
Bayesian step (3)
the class with the highest posterior probability is the outcome of prediction
33
before any clustering is performed it is required to determine the \_
proximity matrix containing the distance between each point using a distance function
34
One Rule classification algorithm
OneR
35
the probability density function for the normal distribution is defined by two parameters
mean and standard deviation
36
P(c)
Class Prior Probability
37
decision tree core algorithm by JR Quinlan \_, uses a \_, with no \_
ID3, greedy search, no backtracking
38
OneR, a low total error means a
higher contribution to the predictability of the model
39
OneR generates one rule for:
each predictor in the data then select the rule with the smallest total error as it's one rule
40
decision tree: step 1. calculate the entropy of the \_
target
41
One Rule, to create a rule, \_
contruct a frequency table for each predictor against the targer
42
Information gain =
gain(T,X) = entropy(T) - entropy(T,X)
44
ID3 uses _ to calculate the \_
entropy, homogeneity of a sample
46
Draw a freq table for Outlook: Sunny/Overcase/Rainy, Play golf: yes/no
47
the center of the group of objects (the centroid) is used to determine the average distance between clusters of objects
Centroid method
49
Adding 1 to all the counts when an attribute value doesn't occur with every class value
the zero frequency problem
50
Work out the entropy of the target: * play golf yes = 9 * play gold no = 5
Entropy(5,9) = Entropy(0.36,0.64) =-(0.36 \* log2 0.36) - (0.64 log2 0.64) = 0.94
51
wowk out the likelihood of yes and no, and the probability
answer0.014109
52
Predictor Prior Probability
P(x)
53
K-means clustering algorithm
1. cluster the data into k groups where k is predefined. 2. randomly select k points as cluster centers 3. assign objects to their closest cluster (euclidean) 4. centroid calculation (or mean) 5. repeat until cluster asignments unchanged in rounds Clustering: randomly assigning centroids & repeating
54
Bayesian step 2. Transform the freq tables into _ and calculate the \_
likelihood tables, posterior probability for each class
55
A branch with what entropy needs no more splitting
zero
56
Creating clusters that have a predetermined order from top to bottom
hierarchical clustering