Mathematics Flashcards
Canberra distance
a numerical measure of the distance between pairs of points in a vector space
d(P, Q) = Sum(|Pi - Qi| / (|Pi| + |Qi|))
Euclidean distance
d(P, Q) = sqrt(Sum((Pi - Qi) **2))
Manhattan distance
d(P, Q) = Sum(|Pi - Qi|)
Empirical distribution function
empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.
Feature scalling
a method used to standarize the range of independent variables
Rescaling
(x - min(x)) / range(x)
Mean normalization
(x - mean(x)) / range(x)
standarization
(x - mean(x)) / standard_deviation(x)
scaling to unit length
x / euclidean_length(x)
Euclidean length
Also called magnitude of a vector measures the length of the vector.
||x|| = sqrt(p1 ** 2 + p2 ** 2……..pn ** 2)
Binomial distribution
Ture or false
Poisson process
usually used in scenarios where we are counting the occurrences of certain events that appear to happen at a certain rate, but completely at random
It is derived from binomial distribution, assuming the number of trails is infinite.
Entropy
A measure of disorder in the data
E = - sum(Pi * log2(Pi))
Pi : the probability of each categorical data
Information gain
A measure of the reduction in the data disorder as a result of partitioning