data transformation Flashcards

1
Q

What are the 6 ways of dealing with missing values?

A

1) ignore the tuple
2) fill in manually
3) use global constant
4) measure of central tendency
5) use mean/median for all samples belonging to same class as the tuple
6) use most probable value to fill in missing value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a nominal feature?

A

Categorical variable relating to names where each value represents some kind of category in no meaningful order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(T/F) Nominal data can be represented with numbers

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a binary feature?

A

0s and 1s, symmetric if both states are equally valuable and carry the same weight and asymmetric if outcomes are not equally important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an ordinal feature?

A

A qualitative variable with possible values that have a meaningful order or ranking, but magnitude b/w successive values are not known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are examples of ordinal features?

A

drink size, grade, professional rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can the central tendency of an ordinal feature be described?

A

by the mode or median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can numeric data be represented as a classifier?

A

as-is or normalized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can binary data be represented as a classifier?

A

0s and 1s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can ordinal data be represented as a classifier?

A

ordered numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can nominal data be represented as a classifier?

A

one-hot, numeric proxy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 6 strategies for data transformation?

A

1) Smoothing
2) Attribute clustering
3) Aggregation
4) Normalization
5) Discretization
6) Concept hierarchy generation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the four different types of normalization techniques?

A

1) min-max
2) z-score
3) mean absolute deviation
4) decimal-scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is min-max normalization?

A

Performs linear transformation on original data while preserving original relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is z-score normalization?

A

(zero-mean), values for attribute A are normalized based on the mean and standard deviation of A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mean absolute deviation?

A

The absolute value of deviation from the mean, used bc it’s more robust to outliers than standard deviation

17
Q

What is decimal-scaling normalization?

A

Moves decimal point of values of attribute A where the number of decimal points depends on the maximum absolute value of A

18
Q

What are three methods of discretization?

A

1) Binning
2) Histogram analysis
3) Cluster, DT, and correlation analyses

19
Q

What is discretization?

A

Where raw values of numeric attributes are replaced by interval labels or conceptual labels. can also be a form of data reduction.

20
Q

What are three forms of feature engineering?

A

1) Summarization: replacing multiple instances with column value averages
2) Kernelization: Take existing features and explode into high-dimensional space, or passing a set of features through a function that produces higher-order features
3) Representation learning: let neural network learn feature space in the form of a fector

21
Q

Where does the target variable come from?

A

1) an existing feature that may be missing from some instances
2) a hand labeled feature
3) a feature value that will be known in the future