Xijia Flashcards

1
Q

What is a kernel function and the role in algorithm?

A

A function that’s used when non-linear data is projected into higher dimensions. ( X => ØX ) The kernel function is used to make this computation at the same time.

Given x1 and x2 returns < Øx1 , Øx2 > without calculating ØX.

One is therefore able to transform the data without the need for excessive computer power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the basic property of kernel function? (Gram matrix)

A

The kernel function holds one central part. The Gram-matrix. Which is made up from the:

  • kernel
  • training set

The gram-matrix contains the evaluation of the kernel function on al pairs of data-points. All the information to the algorithm must pass through this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Polynomial Kernel

A

F(x, xj) = (x (Dot-product) xj+1)^d

Generalized version of linear kernel and not preferred.

+ more powerful than linear kernel
+ strong physical control
- more hyperparameters
- high polynomial degree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RBF kernel

A

F(x, xj) = exp(- ||x - xj||^2)/(2pi^2)
pi => kontrolls the complexity of the model

It is one of the most preferred and used kernel functions. It is usually chosen for non-linear data. It helps to make proper separation when there is no prior knowledge of data.

+ only one hyperparameter
+ less computional
- powerful and flexible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the basic idea of kernel methods?

A

Functions that given x1 and x2 return without calculating Ø(x).

Transform future-vectors to infinite dimensions without extra computational burden.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is the kernel method viewed as the memory-based method?

A

Memory-based methods keep the training samples and use them during the prediction phase.

Kernel functions store everything in the gram-matrix (kernel information and training-set)

Hence they are memory based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the basic idea of ensemble methods?

A

Use multiple weak methods together to make a better predictive performance.

ex:
{random forest & adabosting}

{boosting & adaboosting}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between kernel methods and Ensemble methods?

A

Kernel methods main goal is to help classification by adding dimensions to the data.

Ensemble methods use multiple methods and combine them to one classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why does bagging (bootstrap aggregation) can help us to improve the predictions?

A

Bagging is when we divide the train-dataset into many subparts (randomly) and train many models and take the average:

+ Raises stability of the model
+ Reduces overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between bagging and adaboost?

A

Both are ensemble methods with random sampling from the test data. But boosting redistributes it’s weights after each training-step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Understand how does Perceptron algorithm work?

A
  • Needs linearly separable data
  • Input: w = (w0,w1,w2)^T {Dimension +1} {w0 = bias}

SUM [w0x0 + w1x1 + w2*x2 ] = (+) or (-)

If missclassification:
wi = wi + Ndxi
new = old + lear. rate

N = learning rate ( how fast are we gonna step towards the line )
d = {1 if miss should be above the line} {-1 if miss should be below line}

Features?

  • {supervised learning}
  • {optimal weight coefficients are automatically learned}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Features of Perceptron algorithm?

A
  • {supervised learning}

* {optimal weight coefficients are automatically learned}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Understand how to kernelize the linear PCA method.

A
  • First axis is of highest important ( x1 is not to scale of x2)
  • PCA’s covariance matrix scales with the number of input dimensions
  • Kernel PCA’s kernel matrix scales with the number of datapoints.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Be able to implement kernel PCA from scratch in R

A
  1. Pick kernel function
  2. Calculate the kernel matrix
  3. Center the kernel matrix
  4. Solve the Eigenproblem
  5. Project the data to each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Understand how to kernelize the ridge regression method.

A

Ridge regression is regression where the cost function is altered by adding a penalty equation to the magnitude of the coefficient.

With kernel alteration xi –> (IO)(Xi) we can analyze infinity higher dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Understand the mean of hyper parameters in KRR.

A
  • Kernel Parameter

* Regularization constant

17
Q

formulation of maximum margin classifier (MMC)

A

Maximum margin classifier
min {b,w} 1/2 w^T*w

S.T. yi(w^Txi+b)>=1 för alla i=1,…,N

18
Q

formulation of soft margin classifier (SMC)

A
  • Give up some high noise cases
  • Introduce slackness parameter

min {b,w,epsi} 1/2 w^t*w + C sum of epsi{i} i = 1 to N

S.T. yi(w^T*xi+b) >= 1- epsi{i} and epsi{i} >= 0 for every I

Hyper parameter: Large C, less noice tolerance, high cost

19
Q

Understand the difference and connections between MMC, SMC, and SVM

(SVM is refered as a kernelized SMC)

A

SVM = SMC + kernel trick

SMC = MMC + penalty on slackness parameter

20
Q

formulation of SVM

A
  • Works like logistic regression or perceptron

* multiple hyperplanes are working

21
Q

Why SVM is called a sparse kernel method?

A

Only the “outliers” matter for our model. We can add new data-points behind the line and it won’t change. That’s why its called a sparse model.

22
Q

What are the hyper-parameters in RBF-kernel SVM? What is the meaning of them?

A
  • Gamma defines how far the influence of a single training example reaches .
  • C - trades correct classification of training examples against maximization of the decision margin.
23
Q

Understand the outputs of the function ’ksvm’ from ’kernlab’ package

A
alpha - resulting suport vectors 
alphaindex - index of resulting suport vectors
coef - corresponding coefficients
 b - negative intercept
nSV - the number of support vectors
obj - value of objective function
error - training error 
cross - cross validation
prob.model - width of the laplacian fitted
24
Q

Know how to train a random forest

A
  • Random Sampling with replacement when choosing data n-times.
  • Random select features and build a decision tree for each dataset (n).
  • Use majority voting of each decision tree
  • bootstrapping + aggregation = bagging

Kolla upp den här så att du förstår den

25
Q

Know how to train a decision tree

A
  • kolla upp detta bättre så att du kan beskriva
26
Q

Connection between decision tree and RF?

A

A random forest is made up of decision threes that use voting to classify the sample.

27
Q

What are the hyper-parameters in RF? What is the meaning for them?

A

mTRM - Amount of variables at a node-split (bara två variabler?)
noDESIZE - Amount of observations in the leaves
nTREE - Number of trees to grow.

28
Q

Why Random forest is self-validated?

A

Random forest is built on bagging. in each round of bootstrap 2/3 of the samples will be included for training. So the decision three can be evaluated on the remaining 1/3.

29
Q

What is OOB error? (Random forest)

A

out of bag error - is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging).

30
Q

How to apply Random forest for features selection?

A

Random forest is good as features selection since they rank naturally by how well they improve the purity of the node. This is called mini impurity. Nodes with the greatest decrease in impurity is at the top of the tree.

31
Q

Understand the outputs from ’randomForest’ package

A

???????

32
Q

Understand the basic idea of the Adaboost algorithm.

A
  • Combines a lot of weak learners
  • Some stumps/trees get more say in classification
  • Each stump is made by taking the previous stumps misstakes