Data Science Flashcards

1
Q

Recall

A

True positive / (True positive + False negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Precision

A

TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

F1

A

2(recall x precision) / (recall + precision)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Accuracy

A

(TP + TN) / (TP + TN + FP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Error

A

1-Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WCSS F (K Means)

A

The sum of the sum of distances of each point in each cluster

Sum(Sum(Distance of point to cluster,Points in cluster),Clusters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Centroid

A

Sum of points / number of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Silhouette coefficient meaning

A

Close to 1 means good clustering
Close to 0 means that object can belong to either cluster
Close to -1 means bad clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Silhouette coef calc

A

A = sum(dist(point to all other points in cluster))/ number of other points in cluster

M = sum(dist( point to all points in other cluster)) / num points in other cluster

N = sum(dist( point to all points in other cluster)) / num points in other cluster

B = min (M, N)

SC = (b-a)/max(a,b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Support
X -> Y

A

Number of transactions with X / total transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confidence
X -> Y

A

Total number of transactions from support with Y / total number of support transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Agglomerative

A

Each object is cluster

two clusters that are the closest. Distance between two clusters C1 and C2 is defined as min distance between object O1 in C1 and O2 in C2

Repeat above step until user specified condition is met (k-clusters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True Pos Rate

A

TP / (TP+FN) = TP / P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True Neg Rate

A

TN / (TN + FP) = TN / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

False Pos Rate

A

FP / (TN + FP) = FP / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

False Neg Rate

A

FN / (TP + FN) = FN / P

17
Q

KMeans

A

Cluster objects based on cluster centers

Object belongs to cluster with nearest centroid

18
Q

Purity

A

Clusters assigned class based on most frequent class in cluster

Purity is number of correctly assigned objects / total objects

1 is perfect, 0 is bad

19
Q

SVM

A

Supervised machine learning algo

Find line that distinctively classifies data points

20
Q

Apriori Algo

A

Given a Min confidence and / or support, find all association rules

Any subset of a frequent itemset must be frequent

Any superset of a non-frequent itemset must not be frequent

find sets of otems that have min support starting fro 1-itemsets and expanding to k-itemsets; if a j-itemset is already not frequent then do not consider any superset of it