Classification Flashcards
When is classification in modeling useful? (read: used)
When the response (outcome Variable) is nominal or categorial.
Must there be an order of the levels in the categorial response in order to use classification?
No, there must be no order in the levels of the categorial response, otherwise other techniques may and should be applied.
In classification, what is the vocabulary?
The vocabulary is the set of Classes C that the response value can be devided into. f.i. {yes, no} or {bus, tram, bike, train}
What does a classifier do?
It takes as an input the features (IV’s) and predicts what class (C) most likely belongs to categorial resonse Y.
What is Multi-Label Classification?
When the instances we test may be associated with more than one class.
f.i. an image containing a house and a dog may be classified as BOTH a house and a dog.
What is Hierarchical or Multilevel Classification?
The classes in set C can be divided into subclasses, so it hierarchical.
What is Structured Classification?
If there is structure in the input, there must also be a structure in the output.
f.i. each word of a sentence can be classed as {noun, verb, etc} but a sentence has to contain a certain structure to be comprehendable.
In Binaray or Multiclass Classification, how do we calculate Accuracy?
Where TP/TN means True Positive/Negative

Is accuracy a good metric for classification?
Only if both cases are well represented, this is because of imbalance.
In classification, what is imbalance?
If one response appears more frequent than the other responses in our data, there is an natural skew in the classifier towards classifying this response class.
In classification, what is recall and how do we calculate it?
fraction of predicted that are correct.
(TP/NF) = true positive/ negative false

In classification, what is precision and how do we calculate it?
fraction of predicted that are correct.
(TP/NF) = true positive/ negative false

In classification, what is the F-score?
And what does it’s value represent?
The F-score is an harmonic mean to balance Precision and Recall.
A high F-score means both P and R are high as well, generally a good thing.

For what types of classification can we use classification as regression?
Only binary classification. However for classification as logistic regression the multi levels can be seperated as well.
In classification as regression, what type of regression is useful?
Logisitc Regression, as the sponse space ensures the probabilty of the link space between the binary values of 0 and 1, hence the threshold is always crossed.

In tree modelling using regions, explain top down greedy approach.
Top-down: from root to leaves
Greedy: make the best split at each point, without looking back or forward
In tree modelling, what is the ideal region accuracy using the Gini Index?
If all instances in a region belong to a certain class, and no other regions contain instances of that class. So the gini index is 1.
In tree modelling, what is pruning?
Pruning is the reduction of an existing tree in order to reducing variance and overfitting.
What are hyperparameters?
They are not parameters of the model, but of the training process.
What process is used to determine how much to prune a tree?
Same as other overfitting models: cross-validation on tree size.
in terms of bias and variance, how do the values change when pruning deep trees?
reduce variance but increase bias
in terms of bias and variance, what values are expected in deep (read: large) trees?
low bias but high variance
What is bagging in modelling and how does it work?
Bagging is a technique to reduce the model complexity without reducing bias (compromising accuracy). often used in tree pruning.
In bragging you create multiple models (trees) with multiple training sets and average their predictions. this reduces variance.
Where are the training sets comming from when using bagging?
You sample (or bootstrap) the (sub) training sets from the full training set.
