K-Means Flashcards
Radius: ____________ from any point of the cluster to its centroid
square root of average distance
Diameter: _______________ between all pairs of points in the cluster
square root of average mean squared distance
What is the elbow method?
plots the value of the cost function produced by different values ofk.
The value ofkat which improvement in distortion ___________ the most is called the elbow
declines
Cost Function: For each k, calculate the ______________
total within-cluster sum of square (wss).
Support:
Freq (X,Y) / N
Confidence:
Freq (X,Y) / Freq (X)
Lift:
Support / Support(X) * Support(Y)
Conviction:
1-supp(y)/(1- conf(x->y))
If an itemset is frequent, then all of its ______ must also be frequent
subsets
If an itemset is not frequent, then all of its _______ cannot be frequent
supersets
The ______ of an itemset never exceeds the _________ of its subsets
support
Mining Association Rules
- Generate all itemsets whose support >=minsup
- Generate high confidence rules from each frequent itemset
An association rule r is strong if
Support(r) ≥ min_sup
Confidence(r) ≥ min_conf
Classification Accuracy
the number of correct predictions made as a ratio of all predictions made.
Log Loss
a performance metric for evaluating the predictions of probabilities of membership to a given class.
Area Under ROC Curve
a performance metric for binary classification problems.
Sensitivity
the true positive rate also called the recall. It is the number instances from the positive (first) class that predicted correctly.
Specificity
the true negative rate. Is the number of instances from the negative class (second) class that were predicted correctly.
Gini Coefficient
2*AUC – 1
The ROC curve
the plot between sensitivity and (1- specificity)
AUC
the ratio under the curve and the total area
Lift charts
a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model.
Calculate the points of the lift curve by
determining the ratio between the result predicted by our model and the result using no model.