Data Science Flashcards
Recall
True positive / (True positive + False negative)
Precision
TP / (TP + FP)
F1
2(recall x precision) / (recall + precision)
Accuracy
(TP + TN) / (TP + TN + FP + FN)
Error
1-Accuracy
WCSS F (K Means)
The sum of the sum of distances of each point in each cluster
Sum(Sum(Distance of point to cluster,Points in cluster),Clusters)
Centroid
Sum of points / number of points
Silhouette coefficient meaning
Close to 1 means good clustering
Close to 0 means that object can belong to either cluster
Close to -1 means bad clustering
Silhouette coef calc
A = sum(dist(point to all other points in cluster))/ number of other points in cluster
M = sum(dist( point to all points in other cluster)) / num points in other cluster
N = sum(dist( point to all points in other cluster)) / num points in other cluster
B = min (M, N)
SC = (b-a)/max(a,b)
Support
X -> Y
Number of transactions with X / total transactions
Confidence
X -> Y
Total number of transactions from support with Y / total number of support transactions
Agglomerative
Each object is cluster
two clusters that are the closest. Distance between two clusters C1 and C2 is defined as min distance between object O1 in C1 and O2 in C2
Repeat above step until user specified condition is met (k-clusters)
True Pos Rate
TP / (TP+FN) = TP / P
True Neg Rate
TN / (TN + FP) = TN / N
False Pos Rate
FP / (TN + FP) = FP / N