Session 4 Flashcards
what is information gain?
it is the most common splitting criterion and is based on entropy
-> it measures how much a split reduces entropy (measures the change in entropy before and after splitting)
what does disorder correspond to?
how mixed the segment is with respect to the values of attribute of interest
what is entropy?
it is a measure of disorder in the data
how can you calculate entropy?
what is a parent set?
original set of examples (data points before splitting)
what is a children set?
an attribute (e.g., age) can segment the parent set into k children sets (subsets)
When is an attribute chosen for splitting?
The attribute that reduces entropy the most (= has the highest information gain) is chosen for the split
what is the formula for information gain?
what are disadvantages of ID3 decision trees?
- tends to prefer splits that result in large number of partition each beaing pure but small (we get a very wide decision tree)
- overfitting with less generealization capability (will try to fit in every outlier -> will make a segment for Musk in ranking of CEO salary, even if he is the only one so high up)
- cannot handle missing value
what are the application possibilities of ANN (artificial neural networks)?
- spam detection
- time series prediction
- pattern recognition (how does van gogh paint)
- computer games
how does ANN function?
it functions like human neurons -> learning by making interneuron connections
what is a single perceptron algorithm ANN?
uses no hidden layer and mimics biology
how does an ANN work?
inputs go into a propagation function that calculations the net input, then a transform cuntion calculates an activation level, then we reieeve an output
what is the propagation function?
where inputs are independent variables, such as # of amenities
what is the activation function?
function/ level that determines whether a neuron (whether the whole process starts) produces an output or not