Supervised Non-Linear Models Flashcards by Joshua Carey-Young

What do Non-Linear algorithms assume?

Non-Linear Algorithms assume a non-linear relationship between x and y.

How well did you know this?

Not at all

Perfectly

What are the 4 common Non-Linear models?

K-Nearest Neighbour
Kernel SVM
Decision Trees
Neural Networks

How well did you know this?

Not at all

Perfectly

How does K-Nearest Neighbour work?

Decide on the value of K, which determines the amount of neighbours in consideration.
Calculate the Euclidean distance between the primary point you’re analysing, and the K nearest neighbours
Sort all neighbours by distance and class
Select the K nearest Neighbours
Pick the majority class

How well did you know this?

Not at all

Perfectly

How do you calculate the Euclidean distance between two points?

distance = square root of ((x1 - x2)^2 + (y1 - y2)^2)

How well did you know this?

Not at all

Perfectly

What are the advantages of using the K-nearest Neighbour algorithm?

Doesn’t require any prior training, just the storage of data
Can be used for both classification and regression

How well did you know this?

Not at all

Perfectly

What are some disadvantages of K Nearest Neighbour?

Computationally Expensive - Finding distances to all training points can be slow
Sensitive to K and Distance
Curse of Dimensionality - Distance becomes less meaningful with higher dimensions

How well did you know this?

Not at all

Perfectly

What are the steps involved in using a Hard Margin SVM?

Define the Hyperplane with the equation w^Tx + b = 0, where w is the weight vector, b is the bias term, and x is the feature vector
Maximise the margin, where the distance between the hyperplane and the closest data points from either class is greatest.
Apply constraints for Hard Margin
Minimise the norm of the weight vector, w, by maximising the margin

How well did you know this?

Not at all

Perfectly

What are the advantages of Hard Margin SVMs?

Theoretical Guarantee - Finds the hyperplane with the maximum margin, leading to good generalisation

Deterministic - There is always a unique solution for linearly separable data.

How well did you know this?

Not at all

Perfectly

What are some disadvantages of Hard Margin SVMs?

Assumes the data is perfectly linearly separable, otherwise the algorithm fails.

Sensitive to Outliers - A single outlier can drastically affect the hyperplane.

How well did you know this?

Not at all

Perfectly

What is the main difference between Soft Margin SVMs and Hard Margin SVMs?

Soft Margin SVMs are designed to handle cases where the data is not perfectly linearly separable, whereas Hard Margin SVMs can’t handle them as well.

How well did you know this?

Not at all

Perfectly

What are the effects of increasing the value of C within Soft Margin SVMs?

A higher C value will result in a larger penalisation for margin violations, thereby leading to smaller margins and fewer miscalculations.

How well did you know this?

Not at all

Perfectly

How does a Kernel SVM differ from that of a Hard or Soft Margin SVM algorithm?

Kernel SVMs are able to solve problems where the data is not linearly separable in the original feature space, by mapping the data to a higher-dimensional space.

How well did you know this?

Not at all

Perfectly

What are the advantages of using a Soft Margin SVM over a Hard Margin SVM?

Soft Margin SVMs are more robust to outliers, and they can be customised more thoroughly to better cater for the problem it’s attempting to solve.

How well did you know this?

Not at all

Perfectly

What is the step-by-step operation of a Kernel SVM?

Discover if the data is linearly separable or not in the standard feature space
If not, then map the data to a higher dimensional space
Apply the ‘Kernel Trick’, which uses a kernel function to map the data to a higher-dimensional space
Solve the SVM optimisation problem by using the kernel function
Create the Decision Boundary, based on the terms of the kernel

How well did you know this?

Not at all

Perfectly

What are the advantages of a Kernel SVM?

Able to handle non-linear data
Very powerful tool for high-dimensional and non-linear datasets
Multiple kernel options allow adaptation to different types of data.

How well did you know this?

Not at all

Perfectly

What are the disadvantages of a Kernel SVM algorithm?

Study These Flashcards

Kernel computation can be slow for large datasets
It requires careful selection of the kernel function and its hyperparameters
Struggles with very large datasets due to quadratic complexity

When would you use a Soft Margin SVM compared to standard Linear Regression?

Study These Flashcards

Soft Margin SVM - Designed for Classification problems, especially when the data is not linearly separable
Linear Regression - Designed for Regression problems to pick continuous target values

How does a Decision Tree work?

Study These Flashcards

Select the feature that best divides the dataset, where the ‘best’ feature is chosen using a splitting criterion
Divide the dataset into two or more subsets based on the selected feature’s values
Repeat steps 1 and 2 at each new ‘node’ that you generate
Stop the process above when you meet the stopping criterion e.g. max number of depths
At each node, assign either labels of the most frequent class as the predicted label (classification), or the average value of samples in the node (regression)

What kind of Splitting Criterion is used to select the best features in a Classification Problem?

Study These Flashcards

Gini Impurity
Entropy (Information Gain)

What kind of Splitting Criterion is used to select the best features in a Regression problem?

Study These Flashcards

Use Mean Squared Error (MSE)
Variance Reduction

What are the advantages of using Decision Trees?

Study These Flashcards

Robust to features with different scales - Doesn’t require feature normalisation
Embedded Feature Selection - Redundant features won’t be selected at decision node
Scalable to handle a large dataset
Could handle missing feature values by ignoring it
High interpretability

What are the disadvantages of using Decision Trees?

Study These Flashcards

Greedy search at each node, which is computationally expensive

Overfitting as the tree goes deeper

What is a Random Forest?

Study These Flashcards

A Random Forest is an ensemble learning method that combines multiple decision trees to improve performance.

What are the advantages of using Random Forest?

Study These Flashcards

Reduces overfitting
Handles non-linear data well
Can be used to evaluate importance of features
Left out data can be used to estimate the error without needing a separate validation set

What are the disadvantages of using Random Forest?

Not easily interpretable Computationally intensive Making predictions requires aggregating the results of many trees, so it can be slower compared to individual models.

What are the hyperparameters for Random Forest?

Number of trees in the forest Maximum depth of each tree Minimum number of samples required to split a node Minimum number of samples required for a leaf node Maximum number of features to consider when looking for the best split. The impurity function to use

Supervised Non-Linear Models Flashcards

Some notes from the lectures covering Supervised Non-Linear Models that may help in the final exam. (26 cards)