How to Change Numerical Data Distributions Flashcards

1
Q

WHAT ARE THE CAUSES OF HIGHLY SKEWED OR NON-STANDARD DISTRIBUTION? P288

A

Outliers
Multi-modal distributions
Highly exponential distributions, etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DOES STANDARD DISTRIBUTION FOR THE TARGET VALUES HELP THE PERFORMANCE? P288

A

Yes, many ML algorithms prefer or perform better when numerical input variables and even output variables in the case of regression have a standard probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT DOES A QUANTILE TRANSFORM DO? P289

A

It’ll map a variable’s probability distribution to another probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

WHAT IS A CUMULATIVE DISTRIBUTION FUNCTION? P289

A

The cumulative distribution function (CDF) is the probability that a random variable, say X, will take a value less than or equal to x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT IS A PERCENT-POINT FUNCTION (PPF)? P289

A

It’s also called quantile function, it’s the inverse of the cumulative probability distribution (CDF). It returns the value at or below a given probability.
WWW: think you have kde plot, you want to know the probability of values in a distribution, being below a certain value (CDF), or the inverse case (PPF): you have a probability value, you want to know up to what value we need to calculate the area under the kde (probability) to amount to the input probability. It’s basically CDF: x=variable belonging to a certain distribution, y=area under the kde curve (probability) and PPF: Inverse of CDF so x=area under the kde curve (probability) y= variable belonging to a certain distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WHAT DOES QUANTILE TRANSFORMER IN SCIKIT-LEARN DO? P289

A

This method transforms the features to follow a uniform or a normal distribution.
First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHAT IS THE MEANING OF “N_QUANTILES” PARAMETER IN QUANTILE TRANSFORMER? WHAT RANGE OF VALUES CAN IT HAVE? P289

A

The resolution of the mapping or ranking of the observations in the dataset. This must be set to a value less than the number of observations in the dataset and defaults to 1000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHEN CAN IT BE BENEFICIAL TO USE OUTPUT_DISTRIBUTION= UNIFORM DISTRIBUTION FOR QUANTILE TRANSFORM? P296

A

Sometimes it can be beneficial to transform a highly exponential or multi-modal distribution to have a uniform distribution. This is especially useful for data with a large and sparse range of values, e.g. outliers that are common rather than rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly