Classification and regression Flashcards
classification
predicts discrete class labels
example of classification
labelling emails spam or ham
decision tree classifier
flowchart-like structure in which each node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label tree like model that makes decisions by splitting data into subsets based on feature values creating branches that lead to outcomes (class labels)
decision tree makes a sequence
of partitions of training data one attribute at a time
probability in classification
probability helps determines likelihood of each class level given a set of features
relates to confidence in predictions
ordering in classification
attributes are selected and split based on a measure like information gain creating an order of importance for features
entropy
entropy is a measure of uncertainty or disorder in a system
info entropy in classification
entropy measures how hard it is to guess the label of a randomly taken sample from dataset
choose level with ___ entropy as ___
lowest
as the data labels are more uniform so its easy to guess
how is entropy used in data splits for decision trees?
decision trees use information gain based on entropy to decide best feature to split the data at each node
entropy is calculated before and after split to determine how well a feature divides that data into pure sets
3 steps of entropy and data splits
1) partition example recursively by choosing one attribute at a time
2) choose attribute based on which attribute can separate classes of training examples best
3) choose goodness function (info gain, gain ratio, gini ratio)
3 attribute types
nominal (categorical values with no order like animal, food)
ordinal (categorical values that have order like hot, warm, cold)
numerical
how do you handle numerical attribute in decision tree? and 3 ways you can?
convert to a nominal attribute
1) assign category to numerical and keep trying until you find a good split
2) use entropy value till you find the best split
3) frequency bining
attribute resulting in ____ info gain is selected for split
highest
process of splitting decision tree by attribtiues is continued recursively ____
building tree by splitting data using features that minimise uncertainty at each step
Th is the
entropy threshold
What is the purpose of Th
criterion for deciding when to stop splitting the data at a node or to continue
When entropy of a node is below Th?
If the entropy of a node is below a certain threshold, it means that the data at that node is sufficiently pure (i.e., it mostly contains examples of one class). As a result, the decision tree can stop splitting further at that node, and the node is labeled with the majority class
When entropy of a node is above Th?
If the entropy is above the threshold, it indicates that the data at the node is still impure, meaning there’s a mix of different class labels. In this case, the decision tree continues splitting by choosing the attribute that reduces entropy the most (maximizing information gain)
only use Th=0 when
example is really simple
Th=0, Th>0
=0 perfect order
>1 can tolerate some mixed levels
avoid overfitting by using 1) and 2) and 3)
entropy threshold
pruning
limit depth of tree
gain ratio formula
information gain A/ (#A x A entropy)
want big or small gain ratio and why?
small as prevents selecting attributes that overfit the model by using many small, specific splits
gini index doesn’t rely on
entropy only on class proportion
when you would use info gain as goodness function?
imbalanced dataset
when you would use gain ratio as goodness function?
imbalanced dataset
high brand attribute
when you would use gini index as goodness function?
binary classification
rank fastest to slowest for goodness function evaluation?
fastest gini, middle if IG, slowest is gain ratio
perceptron is an
artificial neuron
fundamental unit in neural networks modelled after biological neuron
activity is weighted sum of its inputs + bias term passed through an activation function to produce output
adjusting weights allows neuron to learn
choice of activation function determines type of computation the neuron performs
single neuron vs multiple computation ability wise
single neuron can only do simple computations but many connected in a large network can deliver any function mapping
what is the activation function symbol
like a hook
activation function
determines output of neuron based on weighted sumo inputs
introduces non-linearity to make the model network capable of learning more complex patterns
e.g sigmoid, relu, softmax
weights
coefficients that adjust the influence of certain input attributes on the output
bias
threshold value added to the sum of weighted inputs to shift the activation function’s output
why is the bias helpful?
shifts decision boundary away from the origin making the model more flexible as without the decision boundary would always pass through the origin
half space
space divided by hyperplane which classifies the data points based on which side they fall on
one hot encoding
converts categorical data variables into a numerical format that machine learning models can use
binary classification vs multi class classification
binary classifies data into 1 of 2 classes
multi classifies data into 1 of many classes
how does multi class classification work?
uses K neuron’s and trains each to separate one class from all others