Exam Flashcards

1
Q

Model Precision Formula

A

TP/TP+FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Model Accuracy Formula

A

TP+TN/All = TP+TN/TP+TN+FP+FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Model Recall Formula

A

TP/TP+FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sup Formula

A

Sup(AB) = P(AB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Conf Formula

A

Conf(A->B) = Sup(AB)/Sup(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Lift Formula

A

Lift(A->B) = Sup(AB)/Sup(A).Sup(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is a rule interesting?

A

When lift is bigger than 1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is a variable frequent?

A

When it’s sup value is over the min support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If AB is non frequent can ABC be frequent?

A

Never

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does DBSCAN work?

A

Creates clusters based on local density, rather than using only distance between points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is cohesion based on?

A

The more dense the clusters the more cohesion there is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is separation based on?

A

The more separate the clusters are from each other the more separation there is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is PCA?

A

Applies transformations to the original variables, generating new ones.
o Find the set of variables that best summarize the data (and allow for its reconstruction), among all possible
linear combinations of the original one.
o Generates new variables as linear combinations of the original ones, creating a new space where the new
variables are independent / orthogonal.
o The importance of each component is given by its variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is LOF (Local Outlier Factor) used for anomaly detection?

A

The anomaly score becomes the ratio of the average local density of its nearest neighbors and its
local density.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MAE (Mean of Absolute value of Errors) Formula

A

Sum(real value - observed value)/total of observed values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MSE (Mean of the Square of Errors) Formula

A

Sum((real value - observed value)^2)/total of observed values

17
Q

RMSE (Root of the Mean of the Square of Errors) Formula

A

Root(Sum((real value - observed value)^2)/total of observed values)

18
Q

R2 formula

A

1- (Sum((real value - value observed)^2)/Sum((real value - mean of the real values)^2)

19
Q

what does the R2 value mean?

A

smaller than 0, the model is worst than just doing a line in the mean
0 means equal to the mean
bigger than 0 means it’s adapting to the data

20
Q

When is ARIMA a good option as a regression model?

A

Stationarity data, data with trends, and short term forecasting

21
Q

When are LSTMS the best approach in forecasting?

A

Multivariate forecasting, long term data, Non-stationarity data

22
Q

What’s a closed pattern?

A

Frequent itemset that is not contained in any other pattern with the same support.

23
Q

What’s the EM algorithm?

A

Iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.

24
Q

When is a time series Stationary?

A

Roughly horizontal, constant variance, no patterns predictable in the long term

25
Q

Accumulated Cost Matrix formula (tabela com números)

A

(value of X - value of Y)^2 + value of (x-1,y-1)