General Flashcards

1
Q

What is Pointwise Mutual Information?

A

It is a measure of the discrepancy between (the coincidence of) the joint probability of 2 RVs, X and Y, and their individual distributions, assuming independence.
PMI(X, Y) = log[ P(X, Y)/ (P(X) P(Y))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sensitivity

A

TPR = TP / (TP + FN)

  • recall, probability of detection, true positive rate
  • avoiding false negatives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Specificity

A

TNR = TN / (TN + FP)

  • true negative rate
  • avoiding false positives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ROC

A
  • graph of the function f with ( 1 - TNR) as independent variable and TPR as dependent variable
    1 - TNR = FPR false positive rate, probability of false alarms
    TPR - probability of detection true positives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Skeweness

A
  • 3rd momentum that indicates the distribution function is assymetric related to the mean
  • ‘positive’ - the tail is longer (or fatter) on the ‘positive’ side of the x-axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Kurtois

A
  • 4th momentum that indicates the peakness of the distribution function - meaning “higher kurtosis is the result of infrequent extreme deviations (or outliers), as opposed to frequent modestly sized deviations”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

‘skewed to normal’ transformation

A

rule of thumb: having skewness in the range of −0.8 to 0.8 and kurtosis in the range of −3.0 to 3.0, use log or Box-Cox transformation to a normal (symmetric) distribution
- make sure to transform it back i.e using exp() in case of log transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

multicollinearity

A
  • a situation in which 2 or more explanatory variables in a multiple regression are highly linear
  • this affects the independence assumption for the columns of the input matrix X (thus its inverse in OLS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

detect multicollinearity

A

Perturbing the data: multicollinearity can be detected by adding random noise to the data, re-running the regression many times, and seeing how much the coefficients change (see wikipedia)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

transform a random var to a uniform distribution

A

Using the probability integral transform, if X is any random variable, and F is the cumulative distribution function of X, then as long as F is invertible, the random variable U = F(X) follows a uniform distribution on the unit interval [0,1].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

coefficient of determination, R-squared

A
  • indicates how much variation in the data can be explained by the model
  • indicates the proportion of the variance of dependent variable that can be explained by the independent variables
  • 1: fully explained, 0: not at all explained
  • i.e. in linear regression: square of the Pearson sample correlation coeficient (when using also the intercept)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly