pre-processing Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

general steps

A

FeFeSPoT
* transformations (center/scale, skewness, Box-Cox)
* feature extraction
* feature engineering
* predictor selection
* supervised vs unsupervised–supervised considers outcome variable (like PLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

feature extraction

A
  • aka signal extraction
  • identifying and extracting features relevant for a particular problem
  • eg, PCA (amounting to dimensionality reduction)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

one-hot encoding / indicator variables

A

one-hot may refer to encoding every factor level with 0/1, while indicator or dummy variables typically leave one level out (to avoid collinearities)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

maximum dissimilarity sampling

A
  • a way to stratify test/train sets, by ensuring maximal separation between instances in the predictor phase space
  • can also be conditioned on a per-class basis for classification problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

resampling types

A
  • LOOCV–for n samples, one is held out for testing, and the other n-1 are used for training
  • LGOCV–aka Monte Carlo cross-validation, this just sets a train/test ratio, and then randomly resamples the dataset to create splits on the fly, over some number of repititions
  • bootstrap
    • random test/train split produced by pulling k samples out of n total instances, with replacement
    • likely there will be samples not picked (due to repeats)–these are the “out of bag” samples, and will be used for testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

principal component analysis (PCA)

A
  • works in the per-sample space
  • mutually orthogonal linear combinations of predictors that account for the most possible variance
  • finds eigenvectors of the predictors’ covariance matrix (which is inherently symmetric); the covariances are over all samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly