Classification Flashcards

1
Q

What is supervised learning

A

Supervised learning is an approach to machine learning where a computer algorithm is trained on labelled input data to predict an output or target variable. The algorithm learns from the labelled data to identify underlying patterns and relationships between the input and output variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unsupervised learning

A

Unsupervised learning is a machine learning approach that involves analysing and clustering unlabelled datasets. The algorithm used in unsupervised learning attempt to identify patterns and relationships in the data without the need for human intervention. Unsupervised learning can help to reveal hidden data grouping or patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is reinforcement learning

A

Reinforcement learning is a feedback-based machine learning technique in which an agent learns to make decisions by interacting with its environment. The agent receives feed back in the form of rewards or penalties for its actions and adjust its behaviour to maximise the rewards it receives. Through trial and error, the agent learns to make better decisions and achieve better outcomes over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some common challenges associated with unsupervised learning

A

One of the main challenges associated with unsupervised learning is that the absence of labelled data makes it difficult to evaluate the performance of the algorithm. Another challenge is the potential for the algorithm to identify spurious patterns (occurs when two factors appear causally related to one another but are not), which can lead to incorrect conclusions. Researchers attempt to overcome these challenges by using various techniques, such as clustering validation metrics and visualisation to evaluate the algorithm’s performance and identifying and removing outliers in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of supervised learning

A

Regression and Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a classification problem in machine learning

A

A classification problem is a type of supervised learning problem in which the goal is to predict a categorical label for the output variable based on input variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some characteristics of a classification problem

A

The output variable is categorical, meaning it can take on a limited number of possible values. The input variables do not need to be categorical and can be a combination of text and numeric features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some common algorithms used in classification problems

A

See common algorithms used in classification problems include decision trees, logistic regression, support vector machines (SVMs), and neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you evaluate the performance of a classification model

A

The performance of a classification model can be evaluated using metrics such as accuracy, precision, recall, and F1 score. These metrics measure how well the model correctly predicts the different classes in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some challenges associated with classification problems

A

Some common challenges with classification problems include imbalanced datasets, where one class has significantly more data than the other, and overfitting, where the model fits the training data too closely and performs poorly on new, unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the objective of a linear regressor, and how does it identify the intercept and slope of a linear equation

A

The objective of a linear regressor is to find the best-fitting line that can represent the relationship between an input variable and an output variable. It does so by minimizing the average squared error between the predicted and actual output values. The intercept and slope of the linear equation can be derived from the parameters that minimize the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a perceptron or a single neuron work, and how it related to a linear model

A

A perceptron or a single neuron is a basic building block of a neural network that takes one or more inputs, applies weights to them, and produces an output based on the weighted sum. In a binary classification problem like the one described in the text, the output of the perceptron can be thresholded to produce a binary decision. A perceptron is a linear model in the sense that the output is a linear function of the inputs and weights, although it can be combined with non-linear activation functions to learn more complex patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps to apply classification to a linear regressor model (using example from notes)

A

Step 1: We can fit a linear regressor with output labels being 0 and 1. This is a straightforward approach where we treat the classification problem as a regression problem. Here, the intercept is c = -0.74 and m = 0.33
Step 2: Use thresholding to classify the inputs into different categories. For example, if we decide that if our prediction is below the green line (at 0.5) then our predicted label would be purple, i.e. lightweight. On the other hand, if the prediction is above this threshold, then we would label the associated output as yellow, or heavyweight
Step 3: We need to evaluate the performance of our model. We can use metrics like accuracy, precision, recall, F1-score, tec, to evaluate the performance of our model
Step 4: We can tune the model by adjusting the threshold value, changing the regularisation parameter, using different optimisation techniques, or selecting different features. This step is to improve the performance of the model
Step 5: We can use our trained model to predict the labels for new instances. We can use the same threshold value we have during the training phase to classify the new instances into different categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is it important to evaluate the performance of a machine learning model

A

It is important to evaluate the performance of a machine learning model to determine how well it is able to generalise to new, unseen data. This helps in identifying and addressing potential issues with the model, improving its accuracy and reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is overfitting in machine learning

A

Overfitting is a common problem in machine learning where a model is trained to fit the training data too closely, resulting in poor generalisation to new, unseen data. This can happen when the model is too complex or when there is insufficient data to train the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are hyperparameters in machine learning

A

Hyperparameters are parameters that are set by the user before the model is trained, and they control the behaviour of the learning algorithm. Examples of hyperparameters include the learning rate, regularisation strength, number of hidden layers in a neural network, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is cross-validation in machine learning

A

Cross-validation is a technique used to evaluate the performance of a machine learning model. It involves dividing the data into multiple folds, and then training the model on one fold and testing it on another, and repeating this process for each fold. This helps in obtaining a more reliable estimate of the model’s performance on new, unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Sigmoid function and how is it useful in prediction models

A

The Sigmoid function is a mathematical function that takes an input and limits its output between the range of [0,1]. It is useful in prediction models because it can convert any input into a probability score, which can be interpreted as the likelihood of a particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does the Sigmoid function relate to logistic regression

A

Logistic regression is a type of statistical analysis that is used to predict the probability of a binary outcome (e.g. yes/no, pass/fail). It is based on the Sigmoid function, which maps any input value to a probability score between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the Sigmoid function equation

A

f(x) = 1 / (1 + e^(-t)), where t is the input to the function (t = mx + c), and e is the mathematical constant e, which is approximately equal to 2.71828.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What happens when we replace t with a linear equation in the Sigmoid function

A

When we replace t with a linear equation in the Sigmoid function, we get a logistical function. The logistical function also has an S-shaped curve, but it has an upper and lower limit, which makes it more suitable for binary classification tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Logistic Regression

A

Logistic regression is a statistical technique used to analyse and model the relationship between a categorical dependent variable and one or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the purpose of the transformation function in Logistical Regression

A

The transformation function in Logistic Regression is used to map the output of the linear regression to a probability value between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is the error calculated in Logistical Regression

A

The error in logistical regression is calculated using the difference between the predicted probability and the actual outcome of the dependent variable for each data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is the error minimised in logistic regression

A

The error in logistic regression is minimised by adjusting the parameters of the regression model, such as the slope and intercept , to reduce the overall difference between the predicted probabilities and the actual outcomes for all data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the typical threshold used in Logistic Regression

A

The typical threshold used in Logistic Regression is 0.5, which has a probabilistic meaning and is used to determine the predicted class based on the predicted probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is multi-input linear equation in Logistic Regression

A

Multi-input linear equation in Logistic Regression refers to the use of more than one independent variable in the regression model to predict the probability of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the Logistic Regression Model look like

A

Check Notes

29
Q

What does a single perceptron classifier look like

A

Check notes

30
Q

What are Activation Functions

A

Activation Functions are mathematical functions applied to the output of a regression model, such as neural network, to introduce non-linearity into the model and produce a desired output

31
Q

What is the purpose of Activation Functions

A

The purpose of Activation Function is to introduce non-linearity into the model, which is necessary for the model to learn complex patterns and relationships in the data. Activation Functions also moderate the output from the regressor to produce a desired output

32
Q

What are some examples of Activation Functions?

A

Some examples of Activation Functions include Sigmoid, ReLU (Rectified Linear Unit), Tanh (Hyperbolic Tangent), Softmax, and Leaky ReLU.

33
Q

What does a Multi-Layer Perceptron look like

A

Check Notes

34
Q

What is a confusion matrix

A

A confusion matrix is a table used to evaluate the performance of a classification model by presenting the number of correct and incorrect predictions in a tabular form

35
Q

What is the purpose of a confusion matrix

A

The purpose of a confusion matrix is to provide a clear representation of the performance of a classification model on a given dataset by presenting the true positives, true negatives, false positives, and false negatives

36
Q

What are the components of a confusion matrix

A

The components of a confusion matrix include true positive, false positive, true negative, false negative

37
Q

How is a confusion matrix used to compute different performance metrics

A

A confusion matrix is used to calculate various performance metrics such as accuracy, precision, recall, F1 score, and AUC(Area Under the Curve) by analysing the different components of the matrix. The metrics help to evaluate the performance of a classification model accurately

38
Q

What is a True Positive

A

It represents the number of correct positive predictions

39
Q

What is a False Positive

A

It represents the number of incorrect positive

40
Q

What is a True Negative

A

It represents the number of correct negative predictions

41
Q

What is a False Negative

A

It represents the number of incorrect negative predictions

42
Q

What is a False Positive Rate (FPR)

A

FPR measures the proportion of actual negative instances that are predicted as positive. It is calculated as FP/(FP + TN)

43
Q

What is a False Negative Rate (FNR)

A

FNR measures the proportion of actual positive instances that are predicted as negative. It is calculated as FN/(FN + TP)

44
Q

What is Recall

A

Recall measures the proportion of positive instances that are correctly identified out of all positive instances. It is calculated as TP/(TP + FN)

45
Q

What is Precision

A

Precision measures the proportion of positive instances that are correctly identified. It is calculated as TP/(TP+FP)

46
Q

What is F1 score

A

F1 score is the harmonic mean of precision and recall, and it is a measure of the trade-off between precision and recall. It is calculated as 2 * [ (precision * recall)/(precision + recall) ]

47
Q

What is an imbalanced dataset

A

An imbalanced dataset is a type of dataset where the number of examples in each class is not equal or roughly equal. One or more classes have significantly fewer samples than other classes in the dataset

48
Q

How does an imbalanced dataset affect accuracy

A

When dealing with an imbalanced dataset, accuracy may not be an appropriate measure of model performance. For instance if there are 200k examples of the purple class and only 20 examples of the yellow class, a model that predicts all examples as purple will have a high accuracy of 99.99%, but it will fail to identify any yellow examples. In other words, the model’s performance is highly biased towards the majority class

49
Q

What are the consequences of making large errors in the minority class in an imbalanced dataset

A

In an imbalanced dataset, making large errors in the minority class has little impact on the overall accuracy of the model because the minority class contributes very little to the total number of examples. However, it can have serious consequences in real-world scenarios. For example, in a medical diagnosis system, if the model fails to identify a rare disease it can have life-threatening consequences for the patient

50
Q

What are some alternative metrics that can be used to evaluate model performance in an imbalanced dataset

A

To evaluate model performance in an imbalanced dataset, some alternative metrics that can be used include precision, recall, F1-score, AUC-ROC (Area Under the Receiver Operating Characteristic Curve), and PR-AUC (Precision-Recall Area Under the Curve). These metrics provide a better understanding of the model’s performance, especially in identifying examples of the minority class

51
Q

What are Performance Metrics in Machine Learning

A

Performance Metrics are quantitative measures used to evaluate the performance of a model in terms of its accuracy , precision, recall and other metrics. These metrics are used to determine how well a model is able to predict outcomes based on the input data

52
Q

What is Accuracy as a Performance Metric

A

Accuracy is used to measure the overall correctness of a model’s predictions, regardless of whether the predictions are positive or negative. It is calculated by dividing the number of correct predictions by the total number of predictions

53
Q

What is Sensitivity as Performance Metric

A

Sensitivity is used to measure the proportion of positive cases that are correctly identified by the model. It is also known as the True Positive Rate and is calculated by dividing the number of true positives predictions by the total number of actual positive cases

54
Q

What is Specificity as a Performance Metric

A

Specificity is used to measure the proportion of negative cases that are correctly identified by the model. It is also known as True Negative Rate (TNR) and is calculated by dividing the number of true negative predictions by the total number of actual negative cases

55
Q

What is False Positive Rate (FPR) as a Performance Metric

A

False Positive Rates is a performance metric used to measure the proportion of negative cases that are incorrectly identified as positive by the model. It is calculated by dividing the number of fake positive predictions by the total number of actual negative cases, and is given by 1- TNR

56
Q

What is Informedness as a Performance Metric

A

Informedness is used to evaluate a model’s performance on both balanced and imbalanced data. It is defined as the sum of the True Positive Rate (TPR) and True Negative Rate (TNR) minus 1, and ranges from -1 to 1, with a higher value indicating better performance. 1 = TPR + TNR - 1

57
Q

What happens if the threshold is varied in a Machine Learning model

A

Varying the threshold can have an impact on its performance. The threshold determines the point at which the model makes a positive prediction, and changing it can affect the model’s accuracy, sensitivity and specificity

58
Q

How does changing the threshold affect the model’s performance

A

It can affect its performance in different ways depending on the task and the data. For example, if the threshold is lowered, the model may predict more positive cases, resulting in a higher sensitivity but lower specificity. Conversely , if the threshold is raised, the model may predict fewer positive cases, resulting in a higher specificity but lower sensitivity

59
Q

How can we track the impact of changing the threshold on a model’s performance

A

It can be tracked by plotting the TPR against the FPR for different threshold values. This is known as the Receiver Operating Characteristic (ROC) curve. The area under the ROC curve can be used as a performance metric, with a higher AUC indicating better performance. By analysing the ROC curve, we can determine the optimal threshold for the model based on the specific trade-offs between sensitivity and specificity

60
Q

What is the Receiver Operating Characteristic (ROC) curve

A

The Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classification model, which shows the trade-off between the TPR and FPR for different thresholds

61
Q

What is a Decision Tree

A

A Decision Tree is a tree-like model used for making decisions and predicting outcomes based on a series of conditions or features. It is a machine learning algorithm that recursively partitions the data based on the values of its input features, and makes a series of binary decisions untiol a prediction is made

62
Q

How does a Decision Tree work

A

A decision tree starts at the root node, which represents the entire dataset. It then recursively splits data into smaller subsets based on the values of the input features, such that each split maximally separates the data into different classes. At each split, the algorithm selects the feature that best separates the data into different classes, based on a certain criterion (e.g. information gain, Gini impurity). This process continues until a stopping criterion is met, such as a minimum number of samples at a leaf node or maximum depth of the tree

63
Q

What are the advantages of using a Decision Tree

A

Some advantages include its simplicity, interpretability, and ability to handle both categorical and numerical data. Decision Trees can also handle missing values and outliers, and are relatively fast and scalable for larger datasets. Additionally, they can be used for both classification and regression tasks, and can be used in ensemble methods such as Random Forests to improve their accuracy. Requires little data preparation, The cost of using the tree is logarithmic in the number of data points used to train the tree, able to handle multi-output problems.

64
Q

What are the limitations of using a Decision Tree

A

Some limitations of using Decision Trees include its tendency to overfit the data, especially when the tree is deep or the dataset is noisy. Decision Trees are also sensitive to the choice of hyperparameters and the order of the input features. Additionally, Decision Trees may not perform well on datasets with complex interdependencies among the input features or when the classes are imbalanced

65
Q

How to create a decision tree from a dataset

A

Step 1: For each feature, build a sub-tree by sorting the feature values and computing the average of every consecutive pair. For each average, build a sub-tree using it as a threshold and computer the “Gini” impurity for each resulting leaf. They compute the weight of each leaf and total Gini impurity using a weighted sum. Finally select the sub-tree with the minimum Gini impurity
Step 2: Select the sub-tree with the lowest Gini impurity as the head of the decision tree
Step 3: If there are impure leaves, replace them with one of the other sub-trees after updating the Gini impurity within the subset. Continue until all leaves are pure, or the maximum depth has been reached

66
Q

What is the Gini impurity

A

The measure of the degree of impurity or randomness in a set of categorical outcomes or classifications. It ranges from 0 (when all the elements in a set belong to the same category) to 1 (when the elements in a set are evenly distributed across all categories). In decision tree algorithms, the Gini impurity is used to evaluate the quality of a split in the data by measuring the degree of homogeneity of the resulting subsets after the split. The lower the Gini impurity, the better the split

67
Q

What is the Gini impurity Formula

A

I = 1 - P(+ve)^2 - P(-ve)^2, where P(+ve) and P(-ve) are the proportions of positive and negative instances in the dataset, respectively

68
Q

What is the formula to compute the weight of a leaf

A

Weight of a leaf = (Number of instances belonging to the leaf) / (Total number of instances in the dataset)