How to Mark and Remove Missing Data Flashcards
WHAT IS THE INDICATOR FOR MISSING VALUES WHEN USING “.DESCRIBE()”? WHAT COMMAND CAN WE USE TO FIX IT? P85,86
Missing values are frequently indicated by out-of-range entries, exp: Value of 0 for somethings that can’t have 0 as a value, indicates missing values.
We can use dataset[cols].replace(0,nan).
CAN WE COUNT MISSING VALUES AS A CATEGORY IN DISCRETE (CATEGORICAL) FEATURES? P86
Yes
WHAT IS STATISTICAL IMPUTATION? P92
Calculating a statistical value for each column (such as mean) and replacing all the missing values with that statistic.
FOR IMPUTATION USING KNN, WHAT PARAMETERS SHOULD WE SELECT? P104
1-Distance measure
2-Number of contributing neighbors for each prediction (k parameter)
WHAT IS “NA_VALUES” PARAMETER IN PD.READ_CSV? P105
We can choose the character which represents the missing values in the dataset and this will replace it with NaN.
WHAT IS THE LIBRARY IN SCIKIT-LEARN THAT SUPPORTS KNN IMPUTATION? P106
KNNImputer
HOW DOES ITERATIVE IMPUTATION WORK? P115
Iterative imputation is a process where each feature is modeled as a function of the other features, e.g. a regression problem where missing values are predicted. Each feature is imputed sequentially, one after the other, allowing prior imputed values to be used as part of a model in predicting subsequent features. It’s iterative because this process is repeated multiple times, allowing ever improved estimates of missing values to be calculated as missing values across all features are estimated.
WHAT ARE THE OTHER NAMES FOR ITERATIVE IMPUTATION APPROACH? P115
Fully Conditional Specification (FCS)
Multivariate Imputation by Chained Equations (MICE)
WHAT NUMBER OF ITERATION IS USUALLY ENOUGH FOR ITERATIVE IMPUTATION? P115
Low number, 10-20
WHAT KIND OF ALGORITHMS ARE OFTEN USED FOR ITERATIVE IMPUTATION? P115
Linear models, for their simplicity.
IN WHICH ORDERS CAN WE PROCESS THE FEATURES WITH MISSING VALUES? P115
The order that features are processed sequentially can be considered, such as from the feature with the least missing values to the feature with the most missing values.
WHAT SHOULD WE IMPORT BEFORE USING ITERATIVEIMPUTER? P118
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
WHAT ARE DIFFERENT IMPUTATION STRATEGIES IN ITERATIVE IMPUTATION? P121
Ascending, Descending, right-to-left(Arabic), left-to-right(Roman), random
WHICH PARAMETER OF ITERATIVEIMPUTER IS FOR SETTING THE ITERATION NUMBER? P122
Max_iter