Classification Flashcards
What is supervised learning
Supervised learning is an approach to machine learning where a computer algorithm is trained on labelled input data to predict an output or target variable. The algorithm learns from the labelled data to identify underlying patterns and relationships between the input and output variables
What is unsupervised learning
Unsupervised learning is a machine learning approach that involves analysing and clustering unlabelled datasets. The algorithm used in unsupervised learning attempt to identify patterns and relationships in the data without the need for human intervention. Unsupervised learning can help to reveal hidden data grouping or patterns
What is reinforcement learning
Reinforcement learning is a feedback-based machine learning technique in which an agent learns to make decisions by interacting with its environment. The agent receives feed back in the form of rewards or penalties for its actions and adjust its behaviour to maximise the rewards it receives. Through trial and error, the agent learns to make better decisions and achieve better outcomes over time
What are some common challenges associated with unsupervised learning
One of the main challenges associated with unsupervised learning is that the absence of labelled data makes it difficult to evaluate the performance of the algorithm. Another challenge is the potential for the algorithm to identify spurious patterns (occurs when two factors appear causally related to one another but are not), which can lead to incorrect conclusions. Researchers attempt to overcome these challenges by using various techniques, such as clustering validation metrics and visualisation to evaluate the algorithm’s performance and identifying and removing outliers in the data
What are the two types of supervised learning
Regression and Classification
What is a classification problem in machine learning
A classification problem is a type of supervised learning problem in which the goal is to predict a categorical label for the output variable based on input variables
What are some characteristics of a classification problem
The output variable is categorical, meaning it can take on a limited number of possible values. The input variables do not need to be categorical and can be a combination of text and numeric features
What are some common algorithms used in classification problems
See common algorithms used in classification problems include decision trees, logistic regression, support vector machines (SVMs), and neural networks
How do you evaluate the performance of a classification model
The performance of a classification model can be evaluated using metrics such as accuracy, precision, recall, and F1 score. These metrics measure how well the model correctly predicts the different classes in the dataset
What are some challenges associated with classification problems
Some common challenges with classification problems include imbalanced datasets, where one class has significantly more data than the other, and overfitting, where the model fits the training data too closely and performs poorly on new, unseen data
What is the objective of a linear regressor, and how does it identify the intercept and slope of a linear equation
The objective of a linear regressor is to find the best-fitting line that can represent the relationship between an input variable and an output variable. It does so by minimizing the average squared error between the predicted and actual output values. The intercept and slope of the linear equation can be derived from the parameters that minimize the error
How does a perceptron or a single neuron work, and how it related to a linear model
A perceptron or a single neuron is a basic building block of a neural network that takes one or more inputs, applies weights to them, and produces an output based on the weighted sum. In a binary classification problem like the one described in the text, the output of the perceptron can be thresholded to produce a binary decision. A perceptron is a linear model in the sense that the output is a linear function of the inputs and weights, although it can be combined with non-linear activation functions to learn more complex patterns
What are the steps to apply classification to a linear regressor model (using example from notes)
Step 1: We can fit a linear regressor with output labels being 0 and 1. This is a straightforward approach where we treat the classification problem as a regression problem. Here, the intercept is c = -0.74 and m = 0.33
Step 2: Use thresholding to classify the inputs into different categories. For example, if we decide that if our prediction is below the green line (at 0.5) then our predicted label would be purple, i.e. lightweight. On the other hand, if the prediction is above this threshold, then we would label the associated output as yellow, or heavyweight
Step 3: We need to evaluate the performance of our model. We can use metrics like accuracy, precision, recall, F1-score, tec, to evaluate the performance of our model
Step 4: We can tune the model by adjusting the threshold value, changing the regularisation parameter, using different optimisation techniques, or selecting different features. This step is to improve the performance of the model
Step 5: We can use our trained model to predict the labels for new instances. We can use the same threshold value we have during the training phase to classify the new instances into different categories
Why is it important to evaluate the performance of a machine learning model
It is important to evaluate the performance of a machine learning model to determine how well it is able to generalise to new, unseen data. This helps in identifying and addressing potential issues with the model, improving its accuracy and reliability
What is overfitting in machine learning
Overfitting is a common problem in machine learning where a model is trained to fit the training data too closely, resulting in poor generalisation to new, unseen data. This can happen when the model is too complex or when there is insufficient data to train the model
What are hyperparameters in machine learning
Hyperparameters are parameters that are set by the user before the model is trained, and they control the behaviour of the learning algorithm. Examples of hyperparameters include the learning rate, regularisation strength, number of hidden layers in a neural network, etc.
What is cross-validation in machine learning
Cross-validation is a technique used to evaluate the performance of a machine learning model. It involves dividing the data into multiple folds, and then training the model on one fold and testing it on another, and repeating this process for each fold. This helps in obtaining a more reliable estimate of the model’s performance on new, unseen data
What is the Sigmoid function and how is it useful in prediction models
The Sigmoid function is a mathematical function that takes an input and limits its output between the range of [0,1]. It is useful in prediction models because it can convert any input into a probability score, which can be interpreted as the likelihood of a particular outcome
How does the Sigmoid function relate to logistic regression
Logistic regression is a type of statistical analysis that is used to predict the probability of a binary outcome (e.g. yes/no, pass/fail). It is based on the Sigmoid function, which maps any input value to a probability score between 0 and 1
What is the Sigmoid function equation
f(x) = 1 / (1 + e^(-t)), where t is the input to the function (t = mx + c), and e is the mathematical constant e, which is approximately equal to 2.71828.
What happens when we replace t with a linear equation in the Sigmoid function
When we replace t with a linear equation in the Sigmoid function, we get a logistical function. The logistical function also has an S-shaped curve, but it has an upper and lower limit, which makes it more suitable for binary classification tasks
What is Logistic Regression
Logistic regression is a statistical technique used to analyse and model the relationship between a categorical dependent variable and one or more independent variables
What is the purpose of the transformation function in Logistical Regression
The transformation function in Logistic Regression is used to map the output of the linear regression to a probability value between 0 and 1
How is the error calculated in Logistical Regression
The error in logistical regression is calculated using the difference between the predicted probability and the actual outcome of the dependent variable for each data point
How is the error minimised in logistic regression
The error in logistic regression is minimised by adjusting the parameters of the regression model, such as the slope and intercept , to reduce the overall difference between the predicted probabilities and the actual outcomes for all data points
What is the typical threshold used in Logistic Regression
The typical threshold used in Logistic Regression is 0.5, which has a probabilistic meaning and is used to determine the predicted class based on the predicted probability
What is multi-input linear equation in Logistic Regression
Multi-input linear equation in Logistic Regression refers to the use of more than one independent variable in the regression model to predict the probability of the dependent variable