Knowledge Discovery Flashcards

1
Q

Regression

A

Predict a given continuous-valued variable based on the values of other values assuming a liner or non linear model of dependency

based on a set of training data of tupels (x, y) where is a vector and y is a continuous-valued variable, find a function f such that f(x) = y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification

A

Predict a given discrete-valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency

based on a set of training data of tupels (x,y) where x is a vector and y a discrete-valued variable. find a function f such that f(x) = y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering

A

Goal is to find patterns/structures in the data, wich are previously unknown

doesnt use labeld data

based on a set of training data x find a fuction such that homgeneous objects are aggregated into the same groups

  • high homogenity within clusters
  • high heterogenity between clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

methods agains underfitting

A

underfitting means high bias

feature selection:
get additional features
add polynomial features …

Regularization:
decrese lamda

NN:
larger NN, increase complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

methods against overfitting

A

overfitting means high variance

get more data!

feature selection
* smaller sets of features

Regularization
* increase lamda

NN:

  • dropout
  • smaller NN
  • early stopping
  • increase weight decay
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Evaluation methods

A

Hold-out
* split into to sets (e.g. 80 train, 20 test)
+ evaluation on data which is not used for training
+ easy to implement
- not all data is used for training
- evaluation results strongly depend on choice of training and test set

k-fold cross-validation
* slipt data in k folds, obtain average performance
+ approximation of actual model performance
+ all data used for training
- expensive, k models learned
- choice of k

leave-one-out 
* cross-validation with k=N
\+ makes best use of data 
- very elaborate 
- hight variance of the individual model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gradient Descent vs Normal Equation

A

Gradient Decent:

  • need to choose alpha
  • needs may iterations
  • Works well even when M is large

Normal Equaiton

  • no need to choose alpha
  • don’t need to iterate
  • need to compute (X^TX)^-1 which is very complex
  • slow, if M is very large

Use GD if

  • XTX -1 is not invertible
  • number of features is large (≥10.000)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
Mulitclass methods
one vs rest / one vs one
A

one vs rest :

  • easier interpretable
  • possibility to gain knowledge about the class by inspecting its classifier
  • less computationally expensive O(C)
  • recommended as default choice

one vs one :

  • less sensitive to imbalanced data sets
  • advantageous for algorithms such as kernel algorithms which dont scale well with M
  • much more computationally expensive: O(C^2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

batch gradient descent pros and cons

A

pro:
+ fewer uupdates
+ computationally more efficient the sgd
+ more stable convergence

con:
- entire training data must fit into the RAM
- updates and training speed may become slow for large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

stochastic gradient decent pros and cons

A

pro:
+ the frequent updates immediately give an insight into the performance of the model
+ the increased model update frequency can result in faster learning
+ the noisy update process can allow the model to avoid local minima

con:

  • updating the model so frequently is more computationally expensive, taking significantly longer to train models on large datasets
  • the frequent updates can result in a noisy gradient signal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

mini-batch GD pros and cons

A

pro:
+ the model update frequency is higher than batch gradient descent which allows for a mofe robust convergence, avoiding local minima
+ the batched updates provide a computationally more efficient process than stochastic gradient descent
+ the batching allows both the efficiency of not having all training data in memory and algorithm implementations

con:
- requires additional hyperparameter b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

kernel trick

A

the transformation of data into a higher dimensional space and separate them linearly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kernal Methods

  • linear model
  • representer theorem
  • dual representation
  • kernel function
A

Linear Model:

  • classification = side of hyperplan
  • training = estimation of “good” parameters w and b

Representer Theorem:
optimal weight vector w is a linear combination of training instances

Dual Representation:
During learning and classification, it is sufficient to access only dot products of data points

Kernel Function:
K(x,z) =
* similarity function with an interpretation as a dot product after mapping to vector representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

“Kernel-Trick”

A

Rewrite learning algorithm such that any reference to the input data happens from within inner products

Replace any such inner product by the kernel function of your choice

work with the (linear) algorithm as usual

*non-linearity enters learning through kernels while training algorithm may remain linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SVMs

A

Generalization through margin maximization

  • intuitive motivation
  • unique optimal solution

maximal margin = minimal norm

dual formulation

  • makes it possible to use kernel functions
  • makes problem become a quadric programming problem for which fast algorithms exist

hard margin vs soft margin

many well known implementations available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly