Xijia Flashcards
What is a kernel function and the role in algorithm?
A function that’s used when non-linear data is projected into higher dimensions. ( X => ØX ) The kernel function is used to make this computation at the same time.
Given x1 and x2 returns < Øx1 , Øx2 > without calculating ØX.
One is therefore able to transform the data without the need for excessive computer power.
What is the basic property of kernel function? (Gram matrix)
The kernel function holds one central part. The Gram-matrix. Which is made up from the:
- kernel
- training set
The gram-matrix contains the evaluation of the kernel function on al pairs of data-points. All the information to the algorithm must pass through this.
Polynomial Kernel
F(x, xj) = (x (Dot-product) xj+1)^d
Generalized version of linear kernel and not preferred.
+ more powerful than linear kernel
+ strong physical control
- more hyperparameters
- high polynomial degree
RBF kernel
F(x, xj) = exp(- ||x - xj||^2)/(2pi^2)
pi => kontrolls the complexity of the model
It is one of the most preferred and used kernel functions. It is usually chosen for non-linear data. It helps to make proper separation when there is no prior knowledge of data.
+ only one hyperparameter
+ less computional
- powerful and flexible
What is the basic idea of kernel methods?
Functions that given x1 and x2 return without calculating Ø(x).
Transform future-vectors to infinite dimensions without extra computational burden.
Why is the kernel method viewed as the memory-based method?
Memory-based methods keep the training samples and use them during the prediction phase.
Kernel functions store everything in the gram-matrix (kernel information and training-set)
Hence they are memory based
What is the basic idea of ensemble methods?
Use multiple weak methods together to make a better predictive performance.
ex:
{random forest & adabosting}
{boosting & adaboosting}
What is the difference between kernel methods and Ensemble methods?
Kernel methods main goal is to help classification by adding dimensions to the data.
Ensemble methods use multiple methods and combine them to one classification.
Why does bagging (bootstrap aggregation) can help us to improve the predictions?
Bagging is when we divide the train-dataset into many subparts (randomly) and train many models and take the average:
+ Raises stability of the model
+ Reduces overfitting
What is the difference between bagging and adaboost?
Both are ensemble methods with random sampling from the test data. But boosting redistributes it’s weights after each training-step.
Understand how does Perceptron algorithm work?
- Needs linearly separable data
- Input: w = (w0,w1,w2)^T {Dimension +1} {w0 = bias}
SUM [w0x0 + w1x1 + w2*x2 ] = (+) or (-)
If missclassification:
wi = wi + Ndxi
new = old + lear. rate
N = learning rate ( how fast are we gonna step towards the line ) d = {1 if miss should be above the line} {-1 if miss should be below line}
Features?
- {supervised learning}
- {optimal weight coefficients are automatically learned}
Features of Perceptron algorithm?
- {supervised learning}
* {optimal weight coefficients are automatically learned}
Understand how to kernelize the linear PCA method.
- First axis is of highest important ( x1 is not to scale of x2)
- PCA’s covariance matrix scales with the number of input dimensions
- Kernel PCA’s kernel matrix scales with the number of datapoints.
Be able to implement kernel PCA from scratch in R
- Pick kernel function
- Calculate the kernel matrix
- Center the kernel matrix
- Solve the Eigenproblem
- Project the data to each other
Understand how to kernelize the ridge regression method.
Ridge regression is regression where the cost function is altered by adding a penalty equation to the magnitude of the coefficient.
With kernel alteration xi –> (IO)(Xi) we can analyze infinity higher dimensions