Chapter 10 Probability Density Estimation Flashcards
We rarely do know the distribution because we don’t have access to all possible outcomes for a random variable. In fact, all we have access to is a sample of observations. So we need probability density estimation, or simply density estimation, as we are using the observations in a random sample to estimate the general density of probabilities beyond just the sample of data we have available. What are 2 ways we can use to estimate the density? P 92
1/The first step is to review the density of observations in the random sample with a simple histogram. From the histogram, we might be able to identify a common and well-understood probability distribution that can be used, such as a normal distribution. (Parametric)
2/If not, we may have to fit a model to estimate the distribution. (Non-parametric)
Reviewing a histogram of a data sample with a range of different numbers of bins will help to identify whether the density looks like a common probability distribution or not. True/False
True
What is parametric density estimation? P 96
For example, the normal distribution has two parameters: the mean and the standard deviation. Given these two parameters, we now know the probability distribution function. These parameters can be estimated from data by calculating the sample mean and sample standard deviation. We refer to this process as parametric density estimation.
How can we check if it’s a good fit, after estimating the density using data (kernel density estimation)? 3 ways P 96
Plotting the density function and comparing the shape to the histogram.
Sampling the density function and comparing the generated sample to the real sample.
Using a statistical test to confirm the data fits the distribution.
It is possible that the data does match a common probability distribution, but requires a transformation before parametric density estimation. True/False
True, For example, you may have outlier values that are far from the mean or center of mass of the distribution. This may have the effect of giving incorrect estimates of the distribution parameters and, in turn, causing a poor fit to the data. . These outliers should be removed prior to estimating the distribution parameters
Till when do we continue the transforming before parametric density estimation? P 98 (3 step loop)
Loop Until Fit of Distribution to Data is Good Enough:
1. Estimating distribution parameters
2. Reviewing the resulting PDF against the data
3. Transforming the data to better fit the distribution
What is Non-parametric Density Estimation? P 99
In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution. This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution). In this case, parametric density estimation is not feasible. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a non-parametric method
The most common non-parametric approach for estimating the probability density function of a continuous random variable is called…, …, or, … for short. P 99
Kernel smoothing, kernel density estimation, KDE
What is Kernel Density Estimation? P 99
Non-parametric method for using a dataset to estimate probabilities for new points. A kernel is a mathematical function that returns a probability for a given value of a random variable
What is the smoothing parameter aka, bandwidth, aka Parzen Window? P 99
Parameter that controls the number of samples or window of samples used to estimate the probability for a new point
What is basis function in Kernel Density Estimation? P 99
The contribution of samples within the window can be shaped using different functions, sometimes referred to as basis functions, e.g. uniform normal, etc., with different effects on the smoothness of the resulting density function.
Basis Function (kernel): The function chosen used to control the contribution of samples in the dataset toward estimating the probability of a new point.
Why do we experiment with different window sizes and different contribution functions for Kernel Density Estimation? P 99
Window size and basis function have an effect on the KDE shape, by doing so, we can evaluate the results against histograms of the data to see which combination fits best.
How can we create a bimodal distribution in python? P 100
we can construct a bimodal distribution by combining samples from two different normal distributions.
# example of a bimodal data sample
from matplotlib import pyplot
from numpy.random import normal
from numpy import hstack
# generate a sample
sample1 = normal(loc=20, scale=5, size=300)
sample2 = normal(loc=40, scale=5, size=700)
sample = hstack((sample1, sample2))
# plot the histogram
pyplot.hist(sample, bins=50)
pyplot.show()
Why are bimodal/multimodal data distributions, good cases for using kernel density estimation method? P 100
Data with this distribution does not nicely fit into a common probability distribution, by design. It is a good case for using a non-parametric kernel density estimation method.
The KernelDensity class supports estimating the PDF for multidimensional data. True/False P 103
True
What are the steps to create a bimodal distribution, estimating a Kernel Density for it and checking its fit on data using python? P 101
example of a bimodal data sample
1- Creating 2 normal distributions and using hstack to merge them together, creating a bimodal distribution
from matplotlib import pyplot
from numpy.random import normal
from numpy import hstack
# generate a sample
sample1 = normal(loc=20, scale=5, size=300)
sample2 = normal(loc=40, scale=5, size=700)
sample = hstack((sample1, sample2))
# plot the histogram
pyplot.hist(sample, bins=50)
pyplot.show()
2- Using the scikit-learn machine learning library provides the KernelDensity class that implements kernel density estimation. We need to tweak the hyperparameters of this class to find a good estimation, then we need to reshape it so it becomes a column vector because the kernel density function requires it.
model = KernelDensity(bandwidth=2, kernel=’gaussian’)
sample = sample.reshape((len(sample), 1))
model.fit(sample)
3- Now we have the estimation, we need to check the fit. We use range(1,60) to create discrete values from 1 to 60 because that’s the range of our original sample and we plot the estimated pdf using the model obtained in the previous step.
🧨 Note: we need to convert it to a column vector
🧨 Note: returned values are log values, so we need to use exp() to get the actual values
🧨 Note: .score_samples returns Log-likelihood of each sample in X. This is normalized to be a probability density, and makes the sum of exp(probabilities) be 1. (in this case values are discrete, so the likelihoods are normalized to be probability mass function, meaning that the sum of all of them will be 1, hence exp(probabilities)=1)
values = asarray([value for value in range(1, 60)])
values = values.reshape((len(values), 1))
probabilities = model.score_samples(values)
probabilities = exp(probabilities)
4- Finally, we plot the KDE and histogram overlapping to visualize the fit.
pyplot.hist(sample, bins=50, density=True)
pyplot.plot(values[:], probabilities)
pyplot.show()
What is the output of Probability Density Function and Probability Mass Function (PDF for discrete data)? External
The output of a probability mass function is a probability, whereas the area under the curve produced by a probability density function represents a probability.
Probability Density Functions are a statistical measure used to 🧨 gauge the likely outcome of a discrete value (e.g., the price of a stock or ETF) 🧨. PDFs are plotted on a graph, typically resembling a bell curve, with the probability of the outcomes lying below the curve.
What’s the difference between Probability density function and Probability distribution function? External
The probability distribution function is the integral of the probability density function.
For discrete variables, the probability distribution function is called the probability mass function