3. Classification Flashcards
What is Classification?
Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated, and understood. Different objects are put into a class if they same some fundamental traits.
What is Sentiment Analysis?
The process of computationally identifying and categorising opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.
What is the goal of the Naive Bayes Classifier? What are the issues that come with it?
Goal:
To predict the most likely label given the data.
Issues:
- All features are equally important
- Conditional independence assumption
- Context not taken into account
- Unknown words.
In the Naive Bayes Classifier, describe the formulas for the evidence p(x), prior p(y) and likelihood p(x_i | y)
Evidence:
p(x) = 1 since x is constant
Prior:
p(y) = N_y / N_x
(i.e. number of elements of class y over all data, so the frequency of the class over the data)
Likelihood:
p(x_i | y) = (count(x_i, y) + 1) / (Sum_x count(x,y) + size(vocab)
Describe Logistic Regression and give its issues.
Logistic regression tries to discriminate between classes by solving y = argmax_y P(y|x) by assigning weights w_i to features x_i. We learn these weights in order to make predictions closer to the target output.
Issues:
- Context not taken into account
- Unknown words.
Give the equation of cross-entropy loss and say what is its intuition.
H(P, Q) = - Sum_classes P(y_i) * log Q(y_i)
It measures how close the predicted distribution is to the true distribution.
Why would we use Neural Networks for NLP? What could be an issue?
- Automatically learn features
- Non-linearity
- Multiple parameters and functions so flexibility to fit highly complex data
Issue:
Requires more data for proper training.
You have a sentence:
‘I like to eat big trees’
Length = 6, Embed_size = 2.
If we perform convolution with a window size of 2, what would the resulting matrix’s dimensions be?
m = Length - window_size + 1 m = 6 - 2 + 1 = 5
The matrix’s column size will be equal to the concatenation of the feature column size for each one of the word in the convolution, so 2x2 = 4.
New Matrix Size: 5 x 4
How can we train CNNs to work with unknown words?
We perform character-level or sub-word-level training, where we feed words as a sequence of letters or subwords. This will allow the network to compose new words and recognise unknown words.
Compare FFNNs vs CNNs vs RNNs
FFNNs:
Powerful Classifier for complex problems
CNNs:
Capture Context Hierarchically
RNNs:
Capture context sequentially, for this reason they are non-parallelisable.
Give the equations and interpretations for classification rate/accuracy, recall, precision and f1-score.
Classification Rate: No of correctly classified examples divided by all examples
(TP + TN) / (TP + TN + FP + FN)
Recall: No of correctly classified positive examples divided by total no of positive examples
TP / (TP + FN)
Precision: No of correctly classified positive examples divided by total number of predicted positive examples
TP / (TP + FP)
F1: A mix of Recall and Precision
2 * Precision * Recall / (Precision + Recall)