Data Science Flashcards
Recall
True positive / (True positive + False negative)
Precision
TP / (TP + FP)
F1
2(recall x precision) / (recall + precision)
Accuracy
(TP + TN) / (TP + TN + FP + FN)
Error
1-Accuracy
WCSS F (K Means)
The sum of the sum of distances of each point in each cluster
Sum(Sum(Distance of point to cluster,Points in cluster),Clusters)
Centroid
Sum of points / number of points
Silhouette coefficient meaning
Close to 1 means good clustering
Close to 0 means that object can belong to either cluster
Close to -1 means bad clustering
Silhouette coef calc
A = sum(dist(point to all other points in cluster))/ number of other points in cluster
M = sum(dist( point to all points in other cluster)) / num points in other cluster
N = sum(dist( point to all points in other cluster)) / num points in other cluster
B = min (M, N)
SC = (b-a)/max(a,b)
Support
X -> Y
Number of transactions with X / total transactions
Confidence
X -> Y
Total number of transactions from support with Y / total number of support transactions
Agglomerative
Each object is cluster
two clusters that are the closest. Distance between two clusters C1 and C2 is defined as min distance between object O1 in C1 and O2 in C2
Repeat above step until user specified condition is met (k-clusters)
True Pos Rate
TP / (TP+FN) = TP / P
True Neg Rate
TN / (TN + FP) = TN / N
False Pos Rate
FP / (TN + FP) = FP / N
False Neg Rate
FN / (TP + FN) = FN / P
KMeans
Cluster objects based on cluster centers
Object belongs to cluster with nearest centroid
Purity
Clusters assigned class based on most frequent class in cluster
Purity is number of correctly assigned objects / total objects
1 is perfect, 0 is bad
SVM
Supervised machine learning algo
Find line that distinctively classifies data points
Apriori Algo
Given a Min confidence and / or support, find all association rules
Any subset of a frequent itemset must be frequent
Any superset of a non-frequent itemset must not be frequent
find sets of otems that have min support starting fro 1-itemsets and expanding to k-itemsets; if a j-itemset is already not frequent then do not consider any superset of it