Chapter 10 Probability Density Estimation Flashcards
We rarely do know the distribution because we don’t have access to all possible outcomes for a random variable. In fact, all we have access to is a sample of observations. So we need probability density estimation, or simply density estimation, as we are using the observations in a random sample to estimate the general density of probabilities beyond just the sample of data we have available. What are 2 ways we can use to estimate the density? P 92
1/The first step is to review the density of observations in the random sample with a simple histogram. From the histogram, we might be able to identify a common and well-understood probability distribution that can be used, such as a normal distribution. (Parametric)
2/If not, we may have to fit a model to estimate the distribution. (Non-parametric)
Reviewing a histogram of a data sample with a range of different numbers of bins will help to identify whether the density looks like a common probability distribution or not. True/False
True
What is parametric density estimation? P 96
For example, the normal distribution has two parameters: the mean and the standard deviation. Given these two parameters, we now know the probability distribution function. These parameters can be estimated from data by calculating the sample mean and sample standard deviation. We refer to this process as parametric density estimation.
How can we check if it’s a good fit, after estimating the density using data (kernel density estimation)? 3 ways P 96
Plotting the density function and comparing the shape to the histogram.
Sampling the density function and comparing the generated sample to the real sample.
Using a statistical test to confirm the data fits the distribution.
It is possible that the data does match a common probability distribution, but requires a transformation before parametric density estimation. True/False
True, For example, you may have outlier values that are far from the mean or center of mass of the distribution. This may have the effect of giving incorrect estimates of the distribution parameters and, in turn, causing a poor fit to the data. . These outliers should be removed prior to estimating the distribution parameters
Till when do we continue the transforming before parametric density estimation? P 98 (3 step loop)
Loop Until Fit of Distribution to Data is Good Enough:
1. Estimating distribution parameters
2. Reviewing the resulting PDF against the data
3. Transforming the data to better fit the distribution
What is Non-parametric Density Estimation? P 99
In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution. This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution). In this case, parametric density estimation is not feasible. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a non-parametric method
The most common non-parametric approach for estimating the probability density function of a continuous random variable is called…, …, or, … for short. P 99
Kernel smoothing, kernel density estimation, KDE
What is Kernel Density Estimation? P 99
Non-parametric method for using a dataset to estimate probabilities for new points. A kernel is a mathematical function that returns a probability for a given value of a random variable
What is the smoothing parameter aka, bandwidth, aka Parzen Window? P 99
Parameter that controls the number of samples or window of samples used to estimate the probability for a new point
What is basis function in Kernel Density Estimation? P 99
The contribution of samples within the window can be shaped using different functions, sometimes referred to as basis functions, e.g. uniform normal, etc., with different effects on the smoothness of the resulting density function.
Basis Function (kernel): The function chosen used to control the contribution of samples in the dataset toward estimating the probability of a new point.
Why do we experiment with different window sizes and different contribution functions for Kernel Density Estimation? P 99
Window size and basis function have an effect on the KDE shape, by doing so, we can evaluate the results against histograms of the data to see which combination fits best.
How can we create a bimodal distribution in python? P 100
we can construct a bimodal distribution by combining samples from two different normal distributions.
# example of a bimodal data sample
from matplotlib import pyplot
from numpy.random import normal
from numpy import hstack
# generate a sample
sample1 = normal(loc=20, scale=5, size=300)
sample2 = normal(loc=40, scale=5, size=700)
sample = hstack((sample1, sample2))
# plot the histogram
pyplot.hist(sample, bins=50)
pyplot.show()
Why are bimodal/multimodal data distributions, good cases for using kernel density estimation method? P 100
Data with this distribution does not nicely fit into a common probability distribution, by design. It is a good case for using a non-parametric kernel density estimation method.
The KernelDensity class supports estimating the PDF for multidimensional data. True/False P 103
True