lecture 5 - supervised learning in predictive modeling Flashcards

Question

perceptron

Answer 1

1. contains just 1 **neuron** 2. **bias input** is constant and typically set to 1 3. **inputs** are assumed to be numerical or binary values 4. the inputs and bias are connected to the neuron via **arcs** that each have a **weight** associated with it 5. based on the value of the inputs and the weights, the network provides an **output**

Answer 2

1. take the weighted sum of the inputs 2. apply an activation function to determine the output of the neuron

Answer 3

can only represent linearly separable cases for classification - i.e., only classes we can separate with a hyperplane

Answer 4

- with **back propagation** we can treat cases that are not linearly separable - it is always possible to find a **combination of hyperplanes** that con completely separate the training data, as long as none of the input vectors are identical

Answer 5

- deep neural network - similar to multi-layer NNs, but there are preceding layers that identify features in the input space. - followed by a regular NN - mostly used for image and sound recognition - assume a 3D, 2D, or 1D input space - extract features using **convolutional layers** and **pooling layers**

Answer 6

- find **hyperplane** that maximizes the distance between classes (outer hyperplanes) - mainly targets classification problems

Answer 7

- solves the linear separability problem by using **kernel functions** - these map inputs to a different (high dimensional) feature space in which the problem is linearly separable

Answer 8

1. find k closest examples using previous distance metrics 2. assign classes based on some function (e.g., majority or average class) - can handle any type of attribute (numerical, binary, categorical) depending on the distance function used

Answer 9

1. create tree to decide on target value 2. use criterion to decide on which attributes are most important

Answer 10

1. nodes: decision point associated with an attribute 2. leavess: outcome value 3. branches: attribute value

Answer 11

1. categorical target: use the information gain (based on entropy) 2. numerical target: standard deviation reduction (reduce the standard deviation/diversity in target values)

Answer 12

- the amount of bits that are required to send a certain message. - the more information the message contains, the more bits are required - if all instances are of the same class, this will result in minimal information, and the entropy is 0 - if instances are evenly spread over all classes, we have maximum information, and the entropy is 1 - we want leaves that cover a set of instances of the same class to have an entropy of 0.

Answer 13

- we use bayes formula to express the probability of a target value g given our observations x - bagging: create multiple models on samples of the data and combine output of models using some aggregation function - tackes variance problem

Answer 14

build models sequentially/iteratively and focus models on where we made mistakes before - focuses on bias problem

Answer 15

- iteratively add the most predictive feature - results in a simpler model compared to backward selection

Answer 16

- iteratively remove the least predictive feature - more complex model than forward selection

Answer 17

- add term to the error function to punish for more complex models (i.e., simplifying the model) - as regularization parameter goes up, so does the accuracy of the test set. but the training set accuracy goes down - i.e., generalizability improves if we increase regularization - The higher the parameter value, the more complex models are punished, and hence, the more simple models (with low weights) are favored.

Answer 18

1. forward selection 2. backward selection 3. regularization

Answer 19

1. hidden layer composition 2. maximum iterations

Answer 20

1. maximum iterations 2. C 3. tolerance 4. kernel function

Answer 21

1. minimum samples per leaf - If we set this number too low, overfitting is likely to occur since leafs represent only limited examples. Hence, we could say that this parameter says something about the complexity of the tree. - performance on the training set increases when we decrease the minimum number of example per leaf. The same holds for the test set up to a certain extent. 2. splitting criterion

Answer 22

1. minimum samples per leaf 2. number of trees 3. splitting criterion

Answer 23

1. compute 95% CIs 2. compute confusion matrix

Answer 24

1. bernoulli or categorical distribution for discrete target 2. normal distribution for continuous target

Answer 25

- no, even with infinite training samples - because we cannot guarantee that f is a subset of H - despite this, learning is possible if the in-sample error is a good estimator of the out-sample error

Answer 26

with a higher N, the absolute difference between the in- and out-sample error is *most likely* very small

Answer 27

every finite set of hypotheses is PAC learnable

Answer 28

1. NNs: perceptron and CNN 2. SVM 3. k-NN 4. decision trees 5. naive bayes 6. ensembles: random forest, gradient boosting

Answer 29

1. dataset has limited variation 2. spans only short time period 3. data split results in overlapping temporal windows: data leakage

lecture 5 - supervised learning in predictive modeling Flashcards

(55 cards)