General Flashcards

Question 1

Q

What is Pointwise Mutual Information?

Answer

A

It is a measure of the discrepancy between (the coincidence of) the joint probability of 2 RVs, X and Y, and their individual distributions, assuming independence.
PMI(X, Y) = log[ P(X, Y)/ (P(X) P(Y))]

Question 2

Q

Sensitivity

Answer

A

TPR = TP / (TP + FN)

recall, probability of detection, true positive rate
avoiding false negatives

Question 3

Q

Specificity

Answer

A

TNR = TN / (TN + FP)

true negative rate
avoiding false positives

Question 4

Q

ROC

Answer

A

graph of the function f with ( 1 - TNR) as independent variable and TPR as dependent variable
1 - TNR = FPR false positive rate, probability of false alarms
TPR - probability of detection true positives

Question 5

Q

Skeweness

Answer

A

3rd momentum that indicates the distribution function is assymetric related to the mean
‘positive’ - the tail is longer (or fatter) on the ‘positive’ side of the x-axis

Question 6

Q

Kurtois

Answer

A

4th momentum that indicates the peakness of the distribution function - meaning “higher kurtosis is the result of infrequent extreme deviations (or outliers), as opposed to frequent modestly sized deviations”

Question 7

Q

‘skewed to normal’ transformation

Answer

A

rule of thumb: having skewness in the range of −0.8 to 0.8 and kurtosis in the range of −3.0 to 3.0, use log or Box-Cox transformation to a normal (symmetric) distribution
- make sure to transform it back i.e using exp() in case of log transformation

Question 8

Q

multicollinearity

Answer

A

a situation in which 2 or more explanatory variables in a multiple regression are highly linear
this affects the independence assumption for the columns of the input matrix X (thus its inverse in OLS)

Question 9

Q

detect multicollinearity

Answer

A

Perturbing the data: multicollinearity can be detected by adding random noise to the data, re-running the regression many times, and seeing how much the coefficients change (see wikipedia)

Question 10

Q

transform a random var to a uniform distribution

Answer

A

Using the probability integral transform, if X is any random variable, and F is the cumulative distribution function of X, then as long as F is invertible, the random variable U = F(X) follows a uniform distribution on the unit interval [0,1].

Question 11

Q

coefficient of determination, R-squared

Answer

A

indicates how much variation in the data can be explained by the model
indicates the proportion of the variance of dependent variable that can be explained by the independent variables
1: fully explained, 0: not at all explained
i.e. in linear regression: square of the Pearson sample correlation coeficient (when using also the intercept)