Session 4 Flashcards
what is information gain?
it is the most common splitting criterion and is based on entropy
-> it measures how much a split reduces entropy (measures the change in entropy before and after splitting)
what does disorder correspond to?
how mixed the segment is with respect to the values of attribute of interest
what is entropy?
it is a measure of disorder in the data
how can you calculate entropy?
what is a parent set?
original set of examples (data points before splitting)
what is a children set?
an attribute (e.g., age) can segment the parent set into k children sets (subsets)
When is an attribute chosen for splitting?
The attribute that reduces entropy the most (= has the highest information gain) is chosen for the split
what is the formula for information gain?
what are disadvantages of ID3 decision trees?
- tends to prefer splits that result in large number of partition each beaing pure but small (we get a very wide decision tree)
- overfitting with less generealization capability (will try to fit in every outlier -> will make a segment for Musk in ranking of CEO salary, even if he is the only one so high up)
- cannot handle missing value
what are the application possibilities of ANN (artificial neural networks)?
- spam detection
- time series prediction
- pattern recognition (how does van gogh paint)
- computer games
how does ANN function?
it functions like human neurons -> learning by making interneuron connections
what is a single perceptron algorithm ANN?
uses no hidden layer and mimics biology
how does an ANN work?
inputs go into a propagation function that calculations the net input, then a transform cuntion calculates an activation level, then we reieeve an output
what is the propagation function?
where inputs are independent variables, such as # of amenities
what is the activation function?
function/ level that determines whether a neuron (whether the whole process starts) produces an output or not
how does learning work in ANNs?
- comparing computed (predicted) outputs to desired (true target values) outputs of historical cases
- is defined as a change of weights between units
what are the three tasks in the process of learning in ANNs?
- compute temporary outputs
- compare outputs with dired targets
- adjust the weights and repeat process
when is a data set linearly separable?
if there exists a straight line (in 2D) or a hyperplane (in highe dimensions) that can perfectly separate all data points of one class from those of another class without any errors
when do we need multilayer perceptron?
what are the three layers in multilayer perceptrons?
- input layer: includes single attributes
- hidden layers: the middle layer of ANN which has three or more layers - each layer increases the training effort exponentially
- output layer: the layer containing the solution of the problem
how does the development process of an ANN look like?
what is the activiation function in MLPs?
- relation between the internal activation level and the output
- can be linear or non-linear
- differentiability means if we can build derivates of the cuntion
- there are different types
what are the different types of activation functions?
what are the four types of learning?
- supervised learning
- unsupervised learning
- reinforcement learning (you don’t tell correct output, just say if correct or incorrect)
- direct design methods
what are the two times of learning?
- incremental training (you adapt model step by step, by adding new data incrementally)
- batch training (you train a model only using a subsample of data at a time)
what are the 5 learning rules in ANN?
- delta rule
- gradient descent
- back propagation
- hebbian rule
- competitive leaning
what is back propagation?
- similar to delta rule, but also calculates weight changes for hidden layers
what is gradient descent?
- finding combinations of all weights so that the sum of the squared errors F is minimized
- but required high computational complexity in high dimensional spaces