decision trees Flashcards
what is the information gain
generally speaking, information gain reflects the reduction in the entropy, to be more precise we are interested in the attributes that lead to higher gain when used as a split
what is the downside of the UID
attributes with many values lead to higher gain, but end up with a useless decision tree, hence we overcome this issue by relying on the gain ratio GainRatio(X, S)= Gain(X) / Entropy(S) where X is the label attributes
what is the intropy
the entropy reflects the uncertainty about the messageand how much information can we extract from a particular attribute
e, the receiver needs to ask log2n yes/no questions to know the message. In other words, the log2 here since the output is usually binar