notions Flashcards
actionable insight
an operational insight that can be implemented
bad data
garbage in - garbage out
re-create analysis
is quite difficult but important
bias
things that can influence the decision in the wrong wayy
analytics workflow
modularity - which tools approaches are used
Directed acyclic graph
s a directed graph with no directed cycles. That is, it consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop.
airflow
workflow manager to not user crontab
Sum of squares total / regression / error
SST/TSS - sum of squares of for E 1->n (y1 - mean)^2
SSR/ESS - SS Regression/ explained sum of squares - of difference between the mean value and predicted value
ESS - sum of differences between predict value and real on
Sum of squares total / regression / error
SST/TSS - sum of squares of for E 1->n (y1 - mean)^2
SSR/ESS - SS Regression/ explained sum of squares - of difference between the mean value and predicted value
ESS/RSS(Residual sum of square) - sum of differences between predict value and real on
Sum of squares total / regression / error
SST/TSS - sum of squares of for E 1->n (y1 - mean)^2
SSR/ESS - SS Regression/ explained sum of squares - of difference between the mean value and predicted value
ESS/RSS(Residual sum of square) - sum of differences between predict value and real one
SST = SSR + ESS
Sum of squares total / regression / error
SST/TSS - sum of squares of for E 1->n (y1 - mean)^2
SSR/ESS - SS Regression/ explained sum of squares - of difference between the mean value and predicted value
SSE/RSS(Residual sum of square) - sum of differences between predict value and real on
SST = SSR + ESS
Depndent variable
The one we are trying to predict
OLS
Ordinary Least Squares
OLS
Ordinary Least Squares (min SSE)
R-squared
R^2 = SSR/SST, 1 is best, 0 is worst
R-squared
R^2 = SSR/SST, 1 is best, 0 is worst
R-squared measures how much of the total variability is explained by this model
adjusted R-squared
measures how well your model fits the data. However, it penalizes the use of variables that are meaningless for the regression.
F-statistic
??