K-Means Flashcards

Question 1

Q

Radius: ____________ from any point of the cluster to its centroid

Answer

A

square root of average distance

Question 2

Q

Diameter: _______________ between all pairs of points in the cluster

Answer

A

square root of average mean squared distance

Question 3

Q

What is the elbow method?

Answer

A

plots the value of the cost function produced by different values ofk.

Question 4

Q

The value ofkat which improvement in distortion ___________ the most is called the elbow

Question 5

Q

Cost Function: For each k, calculate the ______________

Answer

A

total within-cluster sum of square (wss).

Question 6

Q

Support:

Answer

A

Freq (X,Y) / N

Question 7

Q

Confidence:

Answer

A

Freq (X,Y) / Freq (X)

Question 8

Q

Lift:

Answer

A

Support / Support(X) * Support(Y)

Question 9

Q

Conviction:

Answer

A

1-supp(y)/(1- conf(x->y))

Question 10

Q

If an itemset is frequent, then all of its ______ must also be frequent

Question 11

Q

If an itemset is not frequent, then all of its _______ cannot be frequent

Answer

A

supersets

Question 12

Q

The ______ of an itemset never exceeds the _________ of its subsets

Question 13

Q

Mining Association Rules

Answer

A

Generate all itemsets whose support >=minsup
Generate high confidence rules from each frequent itemset

Question 14

Q

An association rule r is strong if

Answer

A

Support(r) ≥ min_sup
Confidence(r) ≥ min_conf

Question 15

Q

Classification Accuracy

Answer

A

the number of correct predictions made as a ratio of all predictions made.

Question 16

Q

Log Loss

Answer

Study These Flashcards

A

a performance metric for evaluating the predictions of probabilities of membership to a given class.

Question 17

Q

Area Under ROC Curve

Answer

Study These Flashcards

A

a performance metric for binary classification problems.

Question 18

Q

Sensitivity

Answer

Study These Flashcards

A

the true positive rate also called the recall. It is the number instances from the positive (first) class that predicted correctly.

Question 19

Q

Specificity

Answer

Study These Flashcards

A

the true negative rate. Is the number of instances from the negative class (second) class that were predicted correctly.

Question 20

Q

Gini Coefficient

Answer

Study These Flashcards

A

2*AUC – 1

Question 21

Q

The ROC curve

Answer

Study These Flashcards

A

the plot between sensitivity and (1- specificity)

Question 22

Q

AUC

Answer

Study These Flashcards

A

the ratio under the curve and the total area

Question 23

Q

Lift charts

Answer

Study These Flashcards

A

a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model.

Question 24

Q

Calculate the points of the lift curve by

Answer

Study These Flashcards

A

determining the ratio between the result predicted by our model and the result using no model.

K-Means Flashcards

(24 cards)