Lecture 7 - Features Flashcards
What are the 4 stages of data pre-processing
- Data cleaning
- Data integration
- Data reduction
- Data transformation
What are features
features, also called attributes, are defined as mapping from the instance space to the feature domain.
What are the three main categories of feature statistics
- Statistics of central tendency
- Statistic of dispersion
- Shape statistics
What are the 3 main statistics of central tendency
- mean
- median
- mode
What are the 2 statistics of dispersion
- Variance omega^2
- Standart deviation omega
What are the statistics of dispersion
- range
- midrange point
- quantiles
- interguartile range
The ____ is more sensetive to outlier than the ____
median or mean
mean
median
what is skewness
Skewness is then defined as m/omega^3. A positive value of skewness means that the distribution is right-skewed, which means that the right tail is longer than the left tail. Negative skweness indicates the opposite.
What is Kurtosis
m/omega^4. People often use excess kurtosis m/omega^4 - 3. Positive excess kurtosis means that the distrubution is more sharply peaked than the normal distribution.
when can structured features be constructed
- prior to learning the model
- during learning the model
What is normalisation
From Quantitave to Quantitative
Adapt the scale of quantitative features.
What is calibration
From ortinal, categorical and boolean TO Quantitative
Adds a scale to features that don’t have one
What is discretisation
- from quantitative to ordinal
- from quantitative to categorical
what is ordering
- from ordinal to ordinal
- from categorical to ordinal
- from boolean to ordinal
What is unordering
from ordinal to categorical