Week 3: Regression & Classification (Tree-Based Models) Flashcards

Question 1

Q

Algorithm 1R

Answer

A

A method for computing rules based on the frequencies of individual attributes.

Loop through all attributes
1.1 Loop through each attribute value and class
1.1.1 Build rule based on the most frequent class.
1.1.2 For this attribute value, c_freq(v) = most frequent class for attribute value v
Calculate the rule error rates, and choose the rules with the smallest error rates.

Question 2

Q

Accuracy for Classification Rules

Answer

A

accuracy = number of correctly predicted instances / number of all instances covered

Question 3

Q

Gini Coefficient/Impurity Measure

Answer

A

G = \sum_{k=1}^C Pr(k)^2 (in practice for the module). The higher the Gini coefficient, the purer the split. Splits should maximise Gini coefficient.

Question 4

Q

Entropy

Answer

A

E = (-1) \cdot [\sum_{i=1}^C Pr(k) \cdot \log_2 (Pr(k))]

The lower the entropy, the purer the split. Splits should minimise entropy. Higher entropy means higher disorder.

Question 5

Q

Information Gain

Answer

A

IG = E(v) = [f_1 E(v_1) + … + f_\ell E(v_{\ell})]

v = parent node
v_1,…,v_{\ell} = child nodes
f_k = proportion of instances in child node over all instances. Ideally, information gain needs to be maximised.

Question 6

Q

Chi-square

Answer

A

\chi^2 = \sum_{k=1}^C \frac{(O_k - E_k)^2}{E_k}

Observed Frequency = Cell value

Expected Frequency = (Column Frequency * Row Frequency) / (Total Frequency)

Question 7

Q

Classification and Regression Trees (CART)

Answer

A

Input: Numerical and Categorical
Split Selection: Gini Index
Pruning: Adjusted Error Rate

This method grows binary trees as long as it can find new splits that increase purity. Then it replaces any subtree whose leaves make the same classification with their common parent. It exploits the tradeoff between number of leaves and error rate.

Question 8

Q

Iterative Dichotomiser (ID3)

Answer

A

Input: categorical only
Split Selection: Information Gain
Pruning: No

This method randomly chooses a subset of the training set and builds the decision tree using this subset. It then evaluates this decision tree on the other instances. If the error rate is sufficiently low terminate. Otherwise, it adds the misclassified instances to the subset and repeats.

Question 9

Q

C5.0

Answer

A

Input: numerical and categorical
Split Selection: Information Gain
Pruning: Error rate computation with statistical confidence intervals

This method is similar to CART, but can make multiway splits on categorical variables.

Question 10

Q

Decision Trees

Answer

A

Pros:
1. Can handle mixed data types.
2. Can deal with missing values.

Cons:
1. Decision boundaries are typically parallel to input axes.
2. Overfitting might occur if not properly designed.

Question 11

Q

Leave-One-Out Cross-Validation

Answer

A

A type of N-fold cross-validation, where one fold is left out of the training process entirely and used as the validation set. There is no sampling bias to the lack of random sampling.

Question 12

Q

Bootstrapping

Answer

A

Unlike cross-validation, which uses sampling without replacement, bootstrapping uses sampling with replacement. In each of n trials, the chosen samples are used for training and the non-chosen samples are used for testing. The chance of being chose is 1/n for the n-th trial. As n approaches infinity, the training set size is approximately 63.2% of all instances.

Question 13

Q

True Positive Rate

Answer

A

(True Positives) / (True Positives + False Negatives)

Question 14

Q

False Positive Rate

Answer

A

(False Positives) / (False Positives + True Negatives)

Question 15

Q

Cohen’s Kappa

Answer

A

This metric measures the level of agreement between two classifiers

\kappa = \frac{Pr(a) - Pr(e)}{1 - Pr(e)}

If there are two classifiers, C and C’, and two classes, A and B:

aa: C and C’ agree the instance is A
bb: C and C’ agree that the instance is B
ab, ba: when C and C’ disagree about the classification.

Assuming there are n instances covered by C and C’
Pr(a) = \frac{aa + bb}{n}

Pr(e) = (\frac{aa + ba}{n}) \times (\frac{aa+ab}{n}) + (\frac{bb + ab}_n}) \times (\frac{bb+ba}{n})

Question 16

Q

Lift

Answer

Study These Flashcards

A

Evaluates classifiers by comparing the results with and without the classifier and considering subsets of the instances.

Sort the instances in descending order of predictive probabilities for target output value, according to the classifier.
Given a sample size, select the corresponding number of mostly likely for the target output value instances according to the model.
Calculate the lift factor by counting the number of true positive instances in the samples and divide by the sample size.

The greater the space between the lift curve and the expected line, the better.

Question 17

Q

Receiver Operating Characteristic (ROC)

Answer

Study These Flashcards

A

This metric detects the relationship between the true positive relationship and the false positive rate over a noisy channel.

It looks similar to the lift curve. Unlike Lift, ROC also factors in the false positive rate. The greater the area under the curve (AUC), the more accurate the model.

Question 18

Q

Stratification

Answer

Study These Flashcards

A

The process of randomly sampling data to create training and test sets.

Question 19

Q

Purity

Answer

Study These Flashcards

A

It’s a measure of how many different classes appear in a node of a decision tree.

Week 3: Regression & Classification (Tree-Based Models) Flashcards

(19 cards)