Statistics Flashcards
Accuracy
TP + TN / TP + TN + FP + FN
Number of correct predictions /
Number of all predictions
Good general report of model performance with BALANCED data sets
Why is accuracy alone not enough to evaluate classification models?
Consider benign versus malignant tumors. A typical set of random people would include more than 90% benign (0) because they’re much more common than malignant. If a model predicts all the examples as 0 without making any calculation, the accuracy is more than 90% which is useless. We need other measures called precision and recall
Precision (PPV)
TP / TP + FP
Correct positives /
Positive tests
From all positive PREDICTED, how many are ACTUAL positive?
Focus on precision when you want to be confident in the YES the model gives you; that what’s your model pings is the real deal. It will miss some YES’s, but what it does ping as YES you can be confident in.
Applicant screening. Some viable applicants will get away, but when the model pings a viable applicant, you can be confident about it
Recall (Sensitivity/TPR)
TP / TP + FN
Correct positives /
Actual positives
From all ACTUALLY positive, how many we PREDICTED correctly
Increasing precision_______recall
Decreases
F1 score
2*precision*recall /
precision + recall
A weighted average of precision and recall to combine the two numbers
Use when working with IMBALANCED data sets
Trying to classify tweets by sentiment, positive, negative, neutral, but data set was unbalanced with way more neutral. F1 score describes overall model performance (caring equally about all three classes)
Sensitivity (Recall / TPR)
TP / TP + FN
1 - FNR
Correct positives /
Actual positives
How good is the model at catching YES’s?
A sensitive test helps rule out a disease when the test is negative.
Highly SeNsitive = SNout = rule out
Use sens/spec when every instance of what you’re looking for is too precious to let slip by (illnesses, fraud, terrorist attacks) sensitivity focused model will catch ALL REAL terrorist attacks, ALL TRUE cases of heart disease, etc.
CAVEAT: there will be some false positives: innocent travelers identified as terrorists, some healthy people labeled as diseased
Specificity (TNR)
TN / TN + FP
1-FPR
Correct negatives /
Actual negatives
How good is the model at catching NO’s?
Prevalence
The number of cases in a defined population at a single point in time. Expressed as a decimal or percentage
Positive predictive value (PPV) (Precision)
TP / TP + FP
Actual positive /
Tested positive
The probability that following a positive test result, that individual will TRULY have that disease. Also thought of as clinical relevance of a test.
Related to prevalence, whereas sensitivity and specificity are independent of prevalence.
As prevalence decreases, PPV decreases because there will be more false positives for every true positive
These enable you to rule in/out conditions but not definitively diagnose a condition
Negative predictive value (NPV)
TN / TN + FN
Actual Negative /
Tested Negative
The probability that following a NEGATIVE test result, that individual will TRULY NOT have that disease. Also thought of as clinical relevance of a test.
Related to prevalence, whereas sensitivity and specificity are independent of prevalence.
As prevalence decreases, NPV increases because there will be more true negatives for every false negative
These enable you to rule in/out conditions but not definitively diagnose a condition
Type I error
False Positive
REJECTING the NULL when it is TRUE
Saying there is an effect when there is none
Alpha level
(significance level)
Probability of REJECTING the NULL when it is TRUE (type I error)
Beta level
Probability that you’ll fail to reject the null when it’s false (type II error)
i.e. ACCEPT the NULL when it’s FALSE
Type II error
False Negative
ACCEPTING the NULL when it’s FALSE
Saying there is NO effect when there is one
AUC ROC Curve
Tells how much the model is capable of distinguishing between classes.
X axis is FPR/Spec
Y axis is TPR/Sens
AUC-Area under the curve. Higher the value, better the model at predicting TP and TN. Better than model at distinguishing between patients with the disease and no disease
ROC- receiver operating characteristics. Probability curve
FPR
1 - Specificity
Bessel’s correction
use of n − 1 instead of n in the formula for the sample variance and sample standard deviation,[1] where n is the number of observations in a sample.
bias (or bias function) of an estimator
- difference between this estimator’s expected value and the true value of the parameter being estimated.
- An estimator or decision rule with zero bias is called unbiased. In statistics, “bias” is an objective property of an estimator
- unbiased estimator is preferable to a biased estimator, although in practice, biased estimators (with generally small bias) are frequently used. When a biased estimator is used, bounds of the bias are calculated. A biased estimator may be used for various reasons:
- because an unbiased estimator does not exist without further assumptions about a population;
- because an estimator is difficult to compute (as in unbiased estimation of standard deviation);
- because an estimator is median-unbiased but not mean-unbiased (or the reverse);
- because a biased estimator gives a lower value of some loss function (particularly mean squared error) compared with unbiased estimators (notably in shrinkage estimators); or
- because in some cases being unbiased is too strong a condition, and the only unbiased estimators are not useful.
Nominal Data
data that is used for naming or labelling variables, without any quantitative value.
“named” data
no intrinsic ordering to nominal data
Examples: country, gender, race, hair color
analyisis is done by grouping input variables into categories and calculating the percentage or mode of the distribution.
non-parametric tests
Ordinal Data
type of categorical data with an order. The variables in ordinal data are listed in an ordered manner.
The ordinal variables are usually numbered, so as to indicate the order of the list. However,
the numbers are not mathematically measured or determined but are merely assigned as labels for opinions.
Example: Good, Neutral, Bad
analysed by computing the mode, median and other positional measures like quartiles, percentiles, etc.
usually analyzed with non-parametric tests. Although discouraged, ordinal data is sometimes analysed using parametric statistics,
When to use Parametric Tests
- Interval or Ratio
- Normally distributed
- No outliers
- Equal variances
- Large samples (>30)
When to use non-parametric tests
- Nominal or ordinal data
- not Normal distributed
- Outliers present
- Unequal variances
- small samples
Sufficient Statistic
“no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter”
given a set of independent identically distributed data conditioned on an unknown parameter , a sufficient statistic is a function whose value contains all the information needed to compute any estimate of the parameter (e.g. a maximum likelihood estimate)
A statistic t = T(X) is sufficient for underlying parameter θ precisely if the conditional probability distribution of the data X, given the statistic t = T(X), does not depend on the parameter θ.
example, the sample mean is sufficient for the mean (μ) of a normal distribution with known variance. Once the sample mean is known, no further information about μ can be obtained from the sample itself. On the other hand, for an arbitrary distribution the median is not sufficient for the mean: even if the median of the sample is known, knowing the sample itself would provide further information about the population mean. For example, if the observations that are less than the median are only slightly less, but observations exceeding the median exceed it by a large amount, then this would have a bearing on one’s inference about the population mean.