Xijia Flashcards by Martin Ericson

What is a kernel function and the role in algorithm?

A function that’s used when non-linear data is projected into higher dimensions. ( X => ØX ) The kernel function is used to make this computation at the same time.

Given x1 and x2 returns < Øx1 , Øx2 > without calculating ØX.

One is therefore able to transform the data without the need for excessive computer power.

How well did you know this?

Not at all

Perfectly

What is the basic property of kernel function? (Gram matrix)

The kernel function holds one central part. The Gram-matrix. Which is made up from the:

kernel
training set

The gram-matrix contains the evaluation of the kernel function on al pairs of data-points. All the information to the algorithm must pass through this.

How well did you know this?

Not at all

Perfectly

Polynomial Kernel

F(x, xj) = (x (Dot-product) xj+1)^d

Generalized version of linear kernel and not preferred.

+ more powerful than linear kernel
+ strong physical control
- more hyperparameters
- high polynomial degree

How well did you know this?

Not at all

Perfectly

RBF kernel

F(x, xj) = exp(- ||x - xj||^2)/(2pi^2)
pi => kontrolls the complexity of the model

It is one of the most preferred and used kernel functions. It is usually chosen for non-linear data. It helps to make proper separation when there is no prior knowledge of data.

+ only one hyperparameter
+ less computional
- powerful and flexible

How well did you know this?

Not at all

Perfectly

What is the basic idea of kernel methods?

Functions that given x1 and x2 return without calculating Ø(x).

Transform future-vectors to infinite dimensions without extra computational burden.

How well did you know this?

Not at all

Perfectly

Why is the kernel method viewed as the memory-based method?

Memory-based methods keep the training samples and use them during the prediction phase.

Kernel functions store everything in the gram-matrix (kernel information and training-set)

Hence they are memory based

How well did you know this?

Not at all

Perfectly

What is the basic idea of ensemble methods?

Use multiple weak methods together to make a better predictive performance.

ex:
{random forest & adabosting}

{boosting & adaboosting}

How well did you know this?

Not at all

Perfectly

What is the difference between kernel methods and Ensemble methods?

Kernel methods main goal is to help classification by adding dimensions to the data.

Ensemble methods use multiple methods and combine them to one classification.

How well did you know this?

Not at all

Perfectly

Why does bagging (bootstrap aggregation) can help us to improve the predictions?

Bagging is when we divide the train-dataset into many subparts (randomly) and train many models and take the average:

+ Raises stability of the model
+ Reduces overfitting

How well did you know this?

Not at all

Perfectly

What is the difference between bagging and adaboost?

Both are ensemble methods with random sampling from the test data. But boosting redistributes it’s weights after each training-step.

How well did you know this?

Not at all

Perfectly

Understand how does Perceptron algorithm work?

Needs linearly separable data
Input: w = (w0,w1,w2)^T {Dimension +1} {w0 = bias}

SUM [w0x0 + w1x1 + w2*x2 ] = (+) or (-)

If missclassification:
wi = wi + Ndxi
new = old + lear. rate

N = learning rate ( how fast are we gonna step towards the line )
d = {1 if miss should be above the line} {-1 if miss should be below line}

Features?

{supervised learning}
{optimal weight coefficients are automatically learned}

How well did you know this?

Not at all

Perfectly

Features of Perceptron algorithm?

{supervised learning}

* {optimal weight coefficients are automatically learned}

How well did you know this?

Not at all

Perfectly

Understand how to kernelize the linear PCA method.

First axis is of highest important ( x1 is not to scale of x2)
PCA’s covariance matrix scales with the number of input dimensions
Kernel PCA’s kernel matrix scales with the number of datapoints.

How well did you know this?

Not at all

Perfectly

Be able to implement kernel PCA from scratch in R

Pick kernel function
Calculate the kernel matrix
Center the kernel matrix
Solve the Eigenproblem
Project the data to each other

How well did you know this?

Not at all

Perfectly

Understand how to kernelize the ridge regression method.

Ridge regression is regression where the cost function is altered by adding a penalty equation to the magnitude of the coefficient.

With kernel alteration xi –> (IO)(Xi) we can analyze infinity higher dimensions

How well did you know this?

Not at all

Perfectly

Understand the mean of hyper parameters in KRR.

Study These Flashcards

Kernel Parameter

* Regularization constant

formulation of maximum margin classifier (MMC)

Study These Flashcards

Maximum margin classifier
min {b,w} 1/2 w^T*w

S.T. yi(w^Txi+b)>=1 för alla i=1,…,N

formulation of soft margin classifier (SMC)

Study These Flashcards

Give up some high noise cases
Introduce slackness parameter

min {b,w,epsi} 1/2 w^t*w + C sum of epsi{i} i = 1 to N

S.T. yi(w^T*xi+b) >= 1- epsi{i} and epsi{i} >= 0 for every I

Hyper parameter: Large C, less noice tolerance, high cost

Understand the difference and connections between MMC, SMC, and SVM

(SVM is refered as a kernelized SMC)

Study These Flashcards

SVM = SMC + kernel trick

SMC = MMC + penalty on slackness parameter

formulation of SVM

Study These Flashcards

Works like logistic regression or perceptron

* multiple hyperplanes are working

Why SVM is called a sparse kernel method?

Study These Flashcards

Only the “outliers” matter for our model. We can add new data-points behind the line and it won’t change. That’s why its called a sparse model.

What are the hyper-parameters in RBF-kernel SVM? What is the meaning of them?

Study These Flashcards

Gamma defines how far the influence of a single training example reaches .
C - trades correct classification of training examples against maximization of the decision margin.

Understand the outputs of the function ’ksvm’ from ’kernlab’ package

Study These Flashcards

alpha - resulting suport vectors 
alphaindex - index of resulting suport vectors
coef - corresponding coefficients
 b - negative intercept
nSV - the number of support vectors
obj - value of objective function
error - training error 
cross - cross validation
prob.model - width of the laplacian fitted

Know how to train a random forest

Study These Flashcards

Random Sampling with replacement when choosing data n-times.
Random select features and build a decision tree for each dataset (n).
Use majority voting of each decision tree
bootstrapping + aggregation = bagging

Kolla upp den här så att du förstår den

Know how to train a decision tree

* kolla upp detta bättre så att du kan beskriva

Connection between decision tree and RF?

A random forest is made up of decision threes that use voting to classify the sample.

What are the hyper-parameters in RF? What is the meaning for them?

mTRM - Amount of variables at a node-split (bara två variabler?) noDESIZE - Amount of observations in the leaves nTREE - Number of trees to grow.

Why Random forest is self-validated?

Random forest is built on bagging. in each round of bootstrap 2/3 of the samples will be included for training. So the decision three can be evaluated on the remaining 1/3.

What is OOB error? (Random forest)

out of bag error - is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging).

How to apply Random forest for features selection?

Random forest is good as features selection since they rank naturally by how well they improve the purity of the node. This is called mini impurity. Nodes with the greatest decrease in impurity is at the top of the tree.

Understand the outputs from ’randomForest’ package

???????

Understand the basic idea of the Adaboost algorithm.

* Combines a lot of weak learners * Some stumps/trees get more say in classification * Each stump is made by taking the previous stumps misstakes

Xijia Flashcards

(32 cards)