Chapter 10 Probability Density Estimation Flashcards

1
Q

We rarely do know the distribution because we don’t have access to all possible outcomes for a random variable. In fact, all we have access to is a sample of observations. So we need probability density estimation, or simply density estimation, as we are using the observations in a random sample to estimate the general density of probabilities beyond just the sample of data we have available. What are 2 ways we can use to estimate the density? P 92

A

1/The first step is to review the density of observations in the random sample with a simple histogram. From the histogram, we might be able to identify a common and well-understood probability distribution that can be used, such as a normal distribution. (Parametric)
2/If not, we may have to fit a model to estimate the distribution. (Non-parametric)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reviewing a histogram of a data sample with a range of different numbers of bins will help to identify whether the density looks like a common probability distribution or not. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is parametric density estimation? P 96

A

For example, the normal distribution has two parameters: the mean and the standard deviation. Given these two parameters, we now know the probability distribution function. These parameters can be estimated from data by calculating the sample mean and sample standard deviation. We refer to this process as parametric density estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we check if it’s a good fit, after estimating the density using data (kernel density estimation)? 3 ways P 96

A

ˆ Plotting the density function and comparing the shape to the histogram.
ˆ Sampling the density function and comparing the generated sample to the real sample.
ˆ Using a statistical test to confirm the data fits the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

It is possible that the data does match a common probability distribution, but requires a transformation before parametric density estimation. True/False

A

True, For example, you may have outlier values that are far from the mean or center of mass of the distribution. This may have the effect of giving incorrect estimates of the distribution parameters and, in turn, causing a poor fit to the data. . These outliers should be removed prior to estimating the distribution parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Till when do we continue the transforming before parametric density estimation? P 98 (3 step loop)

A

Loop Until Fit of Distribution to Data is Good Enough:
1. Estimating distribution parameters
2. Reviewing the resulting PDF against the data
3. Transforming the data to better fit the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Non-parametric Density Estimation? P 99

A

In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution. This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution). In this case, parametric density estimation is not feasible. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a non-parametric method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The most common non-parametric approach for estimating the probability density function of a continuous random variable is called…, …, or, … for short. P 99

A

Kernel smoothing, kernel density estimation, KDE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Kernel Density Estimation? P 99

A

Non-parametric method for using a dataset to estimate probabilities for new points. A kernel is a mathematical function that returns a probability for a given value of a random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the smoothing parameter aka, bandwidth, aka Parzen Window? P 99

A

Parameter that controls the number of samples or window of samples used to estimate the probability for a new point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is basis function in Kernel Density Estimation? P 99

A

The contribution of samples within the window can be shaped using different functions, sometimes referred to as basis functions, e.g. uniform normal, etc., with different effects on the smoothness of the resulting density function.
Basis Function (kernel): The function chosen used to control the contribution of samples in the dataset toward estimating the probability of a new point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we experiment with different window sizes and different contribution functions for Kernel Density Estimation? P 99

A

Window size and basis function have an effect on the KDE shape, by doing so, we can evaluate the results against histograms of the data to see which combination fits best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we create a bimodal distribution in python? P 100

A

we can construct a bimodal distribution by combining samples from two different normal distributions.
# example of a bimodal data sample
from matplotlib import pyplot
from numpy.random import normal
from numpy import hstack
# generate a sample
sample1 = normal(loc=20, scale=5, size=300)
sample2 = normal(loc=40, scale=5, size=700)
sample = hstack((sample1, sample2))
# plot the histogram
pyplot.hist(sample, bins=50)
pyplot.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are bimodal/multimodal data distributions, good cases for using kernel density estimation method? P 100

A

Data with this distribution does not nicely fit into a common probability distribution, by design. It is a good case for using a non-parametric kernel density estimation method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The KernelDensity class supports estimating the PDF for multidimensional data. True/False P 103

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the steps to create a bimodal distribution, estimating a Kernel Density for it and checking its fit on data using python? P 101

A

example of a bimodal data sample

1- Creating 2 normal distributions and using hstack to merge them together, creating a bimodal distribution

from matplotlib import pyplot
from numpy.random import normal
from numpy import hstack
# generate a sample
sample1 = normal(loc=20, scale=5, size=300)
sample2 = normal(loc=40, scale=5, size=700)
sample = hstack((sample1, sample2))
# plot the histogram
pyplot.hist(sample, bins=50)
pyplot.show()

2- Using the scikit-learn machine learning library provides the KernelDensity class that implements kernel density estimation. We need to tweak the hyperparameters of this class to find a good estimation, then we need to reshape it so it becomes a column vector because the kernel density function requires it.

model = KernelDensity(bandwidth=2, kernel=’gaussian’)
sample = sample.reshape((len(sample), 1))
model.fit(sample)

3- Now we have the estimation, we need to check the fit. We use range(1,60) to create discrete values from 1 to 60 because that’s the range of our original sample and we plot the estimated pdf using the model obtained in the previous step.
🧨 Note: we need to convert it to a column vector
🧨 Note: returned values are log values, so we need to use exp() to get the actual values
🧨 Note: .score_samples returns Log-likelihood of each sample in X. This is normalized to be a probability density, and makes the sum of exp(probabilities) be 1. (in this case values are discrete, so the likelihoods are normalized to be probability mass function, meaning that the sum of all of them will be 1, hence exp(probabilities)=1)

values = asarray([value for value in range(1, 60)])
values = values.reshape((len(values), 1))
probabilities = model.score_samples(values)
probabilities = exp(probabilities)

4- Finally, we plot the KDE and histogram overlapping to visualize the fit.

pyplot.hist(sample, bins=50, density=True)
pyplot.plot(values[:], probabilities)
pyplot.show()

16
Q

What is the output of Probability Density Function and Probability Mass Function (PDF for discrete data)? External

A

The output of a probability mass function is a probability, whereas the area under the curve produced by a probability density function represents a probability.
Probability Density Functions are a statistical measure used to 🧨 gauge the likely outcome of a discrete value (e.g., the price of a stock or ETF) 🧨. PDFs are plotted on a graph, typically resembling a bell curve, with the probability of the outcomes lying below the curve.

17
Q

What’s the difference between Probability density function and Probability distribution function? External

A

The probability distribution function is the integral of the probability density function.
For discrete variables, the probability distribution function is called the probability mass function