Next Deck Flashcards

1
Q

a decision tree can be converted to a set of rules by mapping from

A

the root node to the leaf nodes one by one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

P(x)

A

Predictor Prior Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

in this method we assign each observative to its own cluster, then compute the similarity (distance) between each of the clusters and join the two most similar clusters, repeat until there is only a single cluster left

A

agglomerative method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

P(x|c)

A

Likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

distance between two clusters is defined as the average distance between _ _ in one cluster to _ _ in the other cluster

A

average linkage, each point, every point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The _ (SSE) between the 2 clusters over all of the variables is used as the distance

A

error sum of squares, Ward’s method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Class prior probability

A

P(c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ID3. If the sample is completely homogeneous the _

A

entropy is zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in this method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters, finally we proceed recursively on each cluster until there is one cluster for each observation

A

divisive method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

to build a decision tree we need to calculate two types of entropy using freq tables as follows. What is the first, and write the formula.

A

(a) entropy of the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OneR Algorithm

A

For each predictor

For each value of that predictor, make a rule is follows

Count how often each class appears

Find the most frequent class

Make the rule assign that class to this value

Calculate the total error of the rules of each predictor

Choose the predictor with the smallest total error

:Fat friends can’t find many charitable characters:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

constructing a decision tree is all about finding attribute that _

A

returns the highest information gain (the most homogeneous branches)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

two types of hierarchical clustering

A

divisive, agglomerative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

decision tree: step 1. calculate the entropy of the target

step 2. split on the different attributes & calc the entropy for each branch

step 3. add proportionally to get the _

step 4. _

The result is the _

A

total entropy for the split

subtract this from the entropy before the split

information gain or decrease in entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P(c|x)

A

Posterior Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-means clustering: intends to partition n objects into k clusters in which _

A

each object belongs to the cluster with the nearest mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bayesian: the _ can be calculated first by constructing _

A

posterior probability, frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Posterior Probability

A

P(c|x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The ID3 algorithm is run recursively on the _ until all data _.

A

non-leaf branches, is classified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ID3. if the sample is equally divided is has:

A

entropy of one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

decision tree: topmost node

A

root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

proximity matrix in clustering: the following methods differ in how the distance between each cluster is measured:

A
  • Single Linkage
  • Complete Linkage
  • Average Linkage
  • Minimum Variance (Ward’s Method)
  • Centroid Method

Single clusters are most comforting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Decision tree: represents a classification or decision

A

Leaf Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ID3 uses _ & _ to construct a decision tree

A

entropy, information gain

19
Q

K-means clusteromg produces _ different clusters. The best number of clusters leading to the greatest seperation is not known as a priori and must be computed from the data.

A

k

19
Q

the object of k-means clustering is to minimize the _

A

total intra-cluster variance or the squared error function

19
Q

decision tree: step 1. calculate the entropy of the target

step 2. split on the different attributes & _

A

calculate the entropy for each branch

21
Q

Decision tree: has two or more branches

A

decision node

22
Q

Likelihood

A

P(x|c)

22
Q

distance between two clusters is defined as the shortest distance beween two points in each cluster

A

single linkage hierarchical clustering

23
Q

The information gain is based on the _ ater a dataset is split on an attribute

A

decrease in entropy

24
Q

distance between two clusters is defined as the longest distance between two points in each cluster

A

complete linkage

26
Q

P(c|x) =

A

P(x|c)P(c)

________

P(x)

28
Q

Draw the likelihood table and work out the posterior probability

A

answer.

29
Q

frequency tables for numerical variables, first option

A

binning

30
Q

the _ is based on Bayes theorem with

A

naive bayesian classified, independence assumptions between predictors

31
Q

Gain(T,X) =

A

Entropy(T) - Entropy(T,X)

32
Q

Bayesian step (3)

A

the class with the highest posterior probability is the outcome of prediction

33
Q

before any clustering is performed it is required to determine the _

A

proximity matrix containing the distance between each point using a distance function

34
Q

One Rule classification algorithm

A

OneR

35
Q

the probability density function for the normal distribution is defined by two parameters

A

mean and standard deviation

36
Q

P(c)

A

Class Prior Probability

37
Q

decision tree core algorithm by JR Quinlan _, uses a _, with no _

A

ID3, greedy search, no backtracking

38
Q

OneR, a low total error means a

A

higher contribution to the predictability of the model

39
Q

OneR generates one rule for:

A

each predictor in the data then select the rule with the smallest total error as it’s one rule

40
Q

decision tree: step 1. calculate the entropy of the _

A

target

41
Q

One Rule, to create a rule, _

A

contruct a frequency table for each predictor against the targer

42
Q

Information gain =

A

gain(T,X) = entropy(T) - entropy(T,X)

44
Q

ID3 uses _ to calculate the _

A

entropy, homogeneity of a sample

46
Q

Draw a freq table for Outlook: Sunny/Overcase/Rainy,

Play golf: yes/no

A
47
Q

the center of the group of objects (the centroid) is used to determine the average distance between clusters of objects

A

Centroid method

49
Q

Adding 1 to all the counts when an attribute value doesn’t occur with every class value

A

the zero frequency problem

50
Q

Work out the entropy of the target:

  • play golf yes = 9
  • play gold no = 5
A

Entropy(5,9)

= Entropy(0.36,0.64)

=-(0.36 * log2 0.36) - (0.64 log2 0.64)

= 0.94

51
Q

wowk out the likelihood of yes and no, and the probability

A

answer0.014109

52
Q

Predictor Prior Probability

A

P(x)

53
Q

K-means clustering algorithm

A
  1. cluster the data into k groups where k is predefined.
  2. randomly select k points as cluster centers
  3. assign objects to their closest cluster (euclidean)
  4. centroid calculation (or mean)
  5. repeat until cluster asignments unchanged in rounds

Clustering: randomly assigning centroids & repeating

54
Q

Bayesian step 2. Transform the freq tables into _ and calculate the _

A

likelihood tables, posterior probability for each class

55
Q

A branch with what entropy needs no more splitting

A

zero

56
Q

Creating clusters that have a predetermined order from top to bottom

A

hierarchical clustering