Exam Flashcards
Model Precision Formula
TP/TP+FP
Model Accuracy Formula
TP+TN/All = TP+TN/TP+TN+FP+FN
Model Recall Formula
TP/TP+FN
Sup Formula
Sup(AB) = P(AB)
Conf Formula
Conf(A->B) = Sup(AB)/Sup(A)
Lift Formula
Lift(A->B) = Sup(AB)/Sup(A).Sup(B)
When is a rule interesting?
When lift is bigger than 1.0
When is a variable frequent?
When it’s sup value is over the min support
If AB is non frequent can ABC be frequent?
Never
How does DBSCAN work?
Creates clusters based on local density, rather than using only distance between points
What is cohesion based on?
The more dense the clusters the more cohesion there is
What is separation based on?
The more separate the clusters are from each other the more separation there is
What is PCA?
Applies transformations to the original variables, generating new ones.
o Find the set of variables that best summarize the data (and allow for its reconstruction), among all possible
linear combinations of the original one.
o Generates new variables as linear combinations of the original ones, creating a new space where the new
variables are independent / orthogonal.
o The importance of each component is given by its variance.
How is LOF (Local Outlier Factor) used for anomaly detection?
The anomaly score becomes the ratio of the average local density of its nearest neighbors and its
local density.
MAE (Mean of Absolute value of Errors) Formula
Sum(real value - observed value)/total of observed values
MSE (Mean of the Square of Errors) Formula
Sum((real value - observed value)^2)/total of observed values
RMSE (Root of the Mean of the Square of Errors) Formula
Root(Sum((real value - observed value)^2)/total of observed values)
R2 formula
1- (Sum((real value - value observed)^2)/Sum((real value - mean of the real values)^2)
what does the R2 value mean?
smaller than 0, the model is worst than just doing a line in the mean
0 means equal to the mean
bigger than 0 means it’s adapting to the data
When is ARIMA a good option as a regression model?
Stationarity data, data with trends, and short term forecasting
When are LSTMS the best approach in forecasting?
Multivariate forecasting, long term data, Non-stationarity data
What’s a closed pattern?
Frequent itemset that is not contained in any other pattern with the same support.
What’s the EM algorithm?
Iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
When is a time series Stationary?
Roughly horizontal, constant variance, no patterns predictable in the long term
Accumulated Cost Matrix formula (tabela com números)
(value of X - value of Y)^2 + value of (x-1,y-1)