from ordinal to ordinal from categorical to ordinal from boolean to ordinal

Lecture 7 - Features Flashcards by Alexander Bazba

What are the 4 stages of data pre-processing

Data cleaning
Data integration
Data reduction
Data transformation

How well did you know this?

Not at all

Perfectly

What are features

features, also called attributes, are defined as mapping from the instance space to the feature domain.

How well did you know this?

Not at all

Perfectly

What are the three main categories of feature statistics

Statistics of central tendency
Statistic of dispersion
Shape statistics

How well did you know this?

Not at all

Perfectly

What are the 3 main statistics of central tendency

mean
median
mode

How well did you know this?

Not at all

Perfectly

What are the 2 statistics of dispersion

Variance omega^2
Standart deviation omega

How well did you know this?

Not at all

Perfectly

What are the statistics of dispersion

range
midrange point
quantiles
interguartile range

How well did you know this?

Not at all

Perfectly

The ____ is more sensetive to outlier than the ____

median or mean

mean
median

How well did you know this?

Not at all

Perfectly

what is skewness

Skewness is then defined as m/omega^3. A positive value of skewness means that the distribution is right-skewed, which means that the right tail is longer than the left tail. Negative skweness indicates the opposite.

How well did you know this?

Not at all

Perfectly

What is Kurtosis

m/omega^4. People often use excess kurtosis m/omega^4 - 3. Positive excess kurtosis means that the distrubution is more sharply peaked than the normal distribution.

How well did you know this?

Not at all

Perfectly

when can structured features be constructed

prior to learning the model
during learning the model

How well did you know this?

Not at all

Perfectly

What is normalisation

From Quantitave to Quantitative
Adapt the scale of quantitative features.

How well did you know this?

Not at all

Perfectly

What is calibration

From ortinal, categorical and boolean TO Quantitative
Adds a scale to features that don’t have one

How well did you know this?

Not at all

Perfectly

What is discretisation

from quantitative to ordinal
from quantitative to categorical

How well did you know this?

Not at all

Perfectly

what is ordering

from ordinal to ordinal
from categorical to ordinal
from boolean to ordinal

How well did you know this?

Not at all

Perfectly

What is unordering

from ordinal to categorical

How well did you know this?

Not at all

Perfectly

what is grouping

Study These Flashcards

from categorical to categorical

what is thresholding

Study These Flashcards

from quantitative to boolean
from ordinal to boolean

what is binarisation

Study These Flashcards

from categorical to boolean

Define thresholding. in words not table

Study These Flashcards

Thresholding transforms a quantitave or an ordinal feature into a boolean feature by dinding a feature value to split on.

how do we set the threshold for thresholding?

Study These Flashcards

Supervised thresholding: hand picked for better performance
unsupervised thresholding: use centeral tendency statistics like mean/median

Describe Discretisation

Study These Flashcards

Discretisation transforms a quantitative feature into an ordinal feature, by creating bins where each bin is an interval

name and exaplain 2 types of discretisation

Study These Flashcards

supervised: bottom-up, work by progressively splitting bins
unsupervised: equal bin width, equal width discretisation

Define normalisation

Study These Flashcards

Feature normalisation neutralises the effect of different quantitative features being measured on different scales.

Give to formulas with which we can normalise data

Study These Flashcards

min-max
z-scores

what is PCA

Principal component analysis is a feature-construbtion teqnique. It works by computing the principal components and using them to performs a change of basis on the data.

Can PCA be performed on quantitative features?

yes

What is the idea of pca

The idea of PCA is to find tehse correlcations and create a new feature that could be represented as a linear combination of the oringial features.

in PCA, the sum of squared distances of projected points from the origin are called ____

eigenvalues

What are principal components

principal components are new features constructed as a linear combination of original features

give 2 approaches to extract principal components

1. Singular value decomposition 2. eigendecomposition

How does singular value decomposition work

using matrixs rows for each feature.

What is imputation

Imputation is the process of filling in missing data

name 3 imputation techniques

1. Mean imputation 2. Regression imputation 3. Expectation maximisation

what is mean imputation

calculate the per class mean/median/mode

what is regression imputation

a regression model is estimated to predict the observed vlaues of a variable based on other variables.

what is expectation maximisation

assuming a multivariate model over all features, use the observed values for maximum-likelyhood estimation of the model parameters, then derive expectations for the unobserved feature values and interate.

Lecture 7 - Features Flashcards

(36 cards)