Building Features from Nominal and Numeric Data in Microsoft Azure Flashcards
A distrubtion of data refers to which of the following?
The skewness of that data
The density of the data
The shape it is in when you graph it
The mean value of that dataset
The shape it is in when you graph it
Much of the field of statistics is predicated on understanding what distribution?
Gaussian
Bernoulli
Exponential
Poisson
Gaussian
The measure of the thickness of the tails of a distribution is known as what?
Mesokurtic
Platykurtic
Kurtosis
Leptokurtic
Kurtosis
There are four core steps in the machine learning process. What are they and what is the order of that process?
Model, Source, Wrangle, Production
Source, Model, Wrangle, Production
Wrangle, Source, Model, Production
Source, Wrangle, Model, Production
Source, Wrangle, Model, Production
A rare chance of occurrence within a dataset is often referred to as what?
Imputation
Outlier
Kurtosis
Platykurtic
Outlier
In Python, missing values often have which entry?
EMPTY
SAN
NAN
NULL
NAN
If the distribution is not Gaussian or the standard deviation is very small, which common scaler might be your best option?
Robust Scaler
Normalizer
Standard Scaler
Min-Max Scaler
Min-Max Scaler
The process of binning manually based on your own personal insight of the data and setting ranges we would like to bin our data into is referred to as what?
Fixed-Width Binning
Detailed Binning
Binning by Instinct
Quantile Binning
Binning by Instinct
Whole numbers that can’t be divided are known as what?
Discrete
Diminished
Qualitative
Continous
Discrete
Which type of scale is used for labeling data?
Ratio
Nominal
Numeric
Discrete
Nominal