Classification Flashcards

Question

How is the error minimised in logistic regression

Answer 1

The error in logistic regression is minimised by adjusting the parameters of the regression model, such as the slope and intercept , to reduce the overall difference between the predicted probabilities and the actual outcomes for all data points

Answer 2

The typical threshold used in Logistic Regression is 0.5, which has a probabilistic meaning and is used to determine the predicted class based on the predicted probability

Answer 3

Multi-input linear equation in Logistic Regression refers to the use of more than one independent variable in the regression model to predict the probability of the dependent variable

Answer 4

Check Notes

Answer 5

Check notes

Answer 6

Activation Functions are mathematical functions applied to the output of a regression model, such as neural network, to introduce non-linearity into the model and produce a desired output

Answer 7

The purpose of Activation Function is to introduce non-linearity into the model, which is necessary for the model to learn complex patterns and relationships in the data. Activation Functions also moderate the output from the regressor to produce a desired output

Answer 8

Some examples of Activation Functions include Sigmoid, ReLU (Rectified Linear Unit), Tanh (Hyperbolic Tangent), Softmax, and Leaky ReLU.

Answer 9

Check Notes

Answer 10

A confusion matrix is a table used to evaluate the performance of a classification model by presenting the number of correct and incorrect predictions in a tabular form

Answer 11

The purpose of a confusion matrix is to provide a clear representation of the performance of a classification model on a given dataset by presenting the true positives, true negatives, false positives, and false negatives

Answer 12

The components of a confusion matrix include true positive, false positive, true negative, false negative

Answer 13

A confusion matrix is used to calculate various performance metrics such as accuracy, precision, recall, F1 score, and AUC(Area Under the Curve) by analysing the different components of the matrix. The metrics help to evaluate the performance of a classification model accurately

Answer 14

It represents the number of correct positive predictions

Answer 15

It represents the number of incorrect positive

Answer 16

It represents the number of correct negative predictions

Answer 17

It represents the number of incorrect negative predictions

Answer 18

FPR measures the proportion of actual negative instances that are predicted as positive. It is calculated as FP/(FP + TN)

Answer 19

FNR measures the proportion of actual positive instances that are predicted as negative. It is calculated as FN/(FN + TP)

Answer 20

Recall measures the proportion of positive instances that are correctly identified out of all positive instances. It is calculated as TP/(TP + FN)

Answer 21

Precision measures the proportion of positive instances that are correctly identified. It is calculated as TP/(TP+FP)

Answer 22

F1 score is the harmonic mean of precision and recall, and it is a measure of the trade-off between precision and recall. It is calculated as 2 * [ (precision * recall)/(precision + recall) ]

Answer 23

An imbalanced dataset is a type of dataset where the number of examples in each class is not equal or roughly equal. One or more classes have significantly fewer samples than other classes in the dataset

Answer 24

When dealing with an imbalanced dataset, accuracy may not be an appropriate measure of model performance. For instance if there are 200k examples of the purple class and only 20 examples of the yellow class, a model that predicts all examples as purple will have a high accuracy of 99.99%, but it will fail to identify any yellow examples. In other words, the model's performance is highly biased towards the majority class

Answer 25

In an imbalanced dataset, making large errors in the minority class has little impact on the overall accuracy of the model because the minority class contributes very little to the total number of examples. However, it can have serious consequences in real-world scenarios. For example, in a medical diagnosis system, if the model fails to identify a rare disease it can have life-threatening consequences for the patient

Answer 26

To evaluate model performance in an imbalanced dataset, some alternative metrics that can be used include precision, recall, F1-score, AUC-ROC (Area Under the Receiver Operating Characteristic Curve), and PR-AUC (Precision-Recall Area Under the Curve). These metrics provide a better understanding of the model's performance, especially in identifying examples of the minority class

Answer 27

Performance Metrics are quantitative measures used to evaluate the performance of a model in terms of its accuracy , precision, recall and other metrics. These metrics are used to determine how well a model is able to predict outcomes based on the input data

Answer 28

Accuracy is used to measure the overall correctness of a model's predictions, regardless of whether the predictions are positive or negative. It is calculated by dividing the number of correct predictions by the total number of predictions

Answer 29

Sensitivity is used to measure the proportion of positive cases that are correctly identified by the model. It is also known as the True Positive Rate and is calculated by dividing the number of true positives predictions by the total number of actual positive cases

Answer 30

Specificity is used to measure the proportion of negative cases that are correctly identified by the model. It is also known as True Negative Rate (TNR) and is calculated by dividing the number of true negative predictions by the total number of actual negative cases

Answer 31

False Positive Rates is a performance metric used to measure the proportion of negative cases that are incorrectly identified as positive by the model. It is calculated by dividing the number of fake positive predictions by the total number of actual negative cases, and is given by 1- TNR

Answer 32

Informedness is used to evaluate a model's performance on both balanced and imbalanced data. It is defined as the sum of the True Positive Rate (TPR) and True Negative Rate (TNR) minus 1, and ranges from -1 to 1, with a higher value indicating better performance. 1 = TPR + TNR - 1

Answer 33

Varying the threshold can have an impact on its performance. The threshold determines the point at which the model makes a positive prediction, and changing it can affect the model's accuracy, sensitivity and specificity

Answer 34

It can affect its performance in different ways depending on the task and the data. For example, if the threshold is lowered, the model may predict more positive cases, resulting in a higher sensitivity but lower specificity. Conversely , if the threshold is raised, the model may predict fewer positive cases, resulting in a higher specificity but lower sensitivity

Answer 35

It can be tracked by plotting the TPR against the FPR for different threshold values. This is known as the Receiver Operating Characteristic (ROC) curve. The area under the ROC curve can be used as a performance metric, with a higher AUC indicating better performance. By analysing the ROC curve, we can determine the optimal threshold for the model based on the specific trade-offs between sensitivity and specificity

Answer 36

The Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classification model, which shows the trade-off between the TPR and FPR for different thresholds

Answer 37

A Decision Tree is a tree-like model used for making decisions and predicting outcomes based on a series of conditions or features. It is a machine learning algorithm that recursively partitions the data based on the values of its input features, and makes a series of binary decisions untiol a prediction is made

Answer 38

A decision tree starts at the root node, which represents the entire dataset. It then recursively splits data into smaller subsets based on the values of the input features, such that each split maximally separates the data into different classes. At each split, the algorithm selects the feature that best separates the data into different classes, based on a certain criterion (e.g. information gain, Gini impurity). This process continues until a stopping criterion is met, such as a minimum number of samples at a leaf node or maximum depth of the tree

Answer 39

Some advantages include its simplicity, interpretability, and ability to handle both categorical and numerical data. Decision Trees can also handle missing values and outliers, and are relatively fast and scalable for larger datasets. Additionally, they can be used for both classification and regression tasks, and can be used in ensemble methods such as Random Forests to improve their accuracy. Requires little data preparation, The cost of using the tree is logarithmic in the number of data points used to train the tree, able to handle multi-output problems.

Answer 40

Some limitations of using Decision Trees include its tendency to overfit the data, especially when the tree is deep or the dataset is noisy. Decision Trees are also sensitive to the choice of hyperparameters and the order of the input features. Additionally, Decision Trees may not perform well on datasets with complex interdependencies among the input features or when the classes are imbalanced

Answer 41

Step 1: For each feature, build a sub-tree by sorting the feature values and computing the average of every consecutive pair. For each average, build a sub-tree using it as a threshold and computer the "Gini" impurity for each resulting leaf. They compute the weight of each leaf and total Gini impurity using a weighted sum. Finally select the sub-tree with the minimum Gini impurity Step 2: Select the sub-tree with the lowest Gini impurity as the head of the decision tree Step 3: If there are impure leaves, replace them with one of the other sub-trees after updating the Gini impurity within the subset. Continue until all leaves are pure, or the maximum depth has been reached

Answer 42

The measure of the degree of impurity or randomness in a set of categorical outcomes or classifications. It ranges from 0 (when all the elements in a set belong to the same category) to 1 (when the elements in a set are evenly distributed across all categories). In decision tree algorithms, the Gini impurity is used to evaluate the quality of a split in the data by measuring the degree of homogeneity of the resulting subsets after the split. The lower the Gini impurity, the better the split

Answer 43

I = 1 - P(+ve)^2 - P(-ve)^2, where P(+ve) and P(-ve) are the proportions of positive and negative instances in the dataset, respectively

Answer 44

Weight of a leaf = (Number of instances belonging to the leaf) / (Total number of instances in the dataset)

Classification Flashcards

(68 cards)