data transformation Flashcards
What are the 6 ways of dealing with missing values?
1) ignore the tuple
2) fill in manually
3) use global constant
4) measure of central tendency
5) use mean/median for all samples belonging to same class as the tuple
6) use most probable value to fill in missing value
What is a nominal feature?
Categorical variable relating to names where each value represents some kind of category in no meaningful order
(T/F) Nominal data can be represented with numbers
True
What is a binary feature?
0s and 1s, symmetric if both states are equally valuable and carry the same weight and asymmetric if outcomes are not equally important
What is an ordinal feature?
A qualitative variable with possible values that have a meaningful order or ranking, but magnitude b/w successive values are not known
What are examples of ordinal features?
drink size, grade, professional rank
How can the central tendency of an ordinal feature be described?
by the mode or median
How can numeric data be represented as a classifier?
as-is or normalized
How can binary data be represented as a classifier?
0s and 1s
How can ordinal data be represented as a classifier?
ordered numeric
How can nominal data be represented as a classifier?
one-hot, numeric proxy
What are the 6 strategies for data transformation?
1) Smoothing
2) Attribute clustering
3) Aggregation
4) Normalization
5) Discretization
6) Concept hierarchy generation
What are the four different types of normalization techniques?
1) min-max
2) z-score
3) mean absolute deviation
4) decimal-scaling
What is min-max normalization?
Performs linear transformation on original data while preserving original relationships
What is z-score normalization?
(zero-mean), values for attribute A are normalized based on the mean and standard deviation of A