Supervised Non-Linear Models Flashcards

Some notes from the lectures covering Supervised Non-Linear Models that may help in the final exam.

1
Q

What do Non-Linear algorithms assume?

A

Non-Linear Algorithms assume a non-linear relationship between x and y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 common Non-Linear models?

A
  • K-Nearest Neighbour
  • Kernel SVM
  • Decision Trees
  • Neural Networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does K-Nearest Neighbour work?

A
  1. Decide on the value of K, which determines the amount of neighbours in consideration.
  2. Calculate the Euclidean distance between the primary point you’re analysing, and the K nearest neighbours
  3. Sort all neighbours by distance and class
  4. Select the K nearest Neighbours
  5. Pick the majority class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you calculate the Euclidean distance between two points?

A

distance = square root of ((x1 - x2)^2 + (y1 - y2)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of using the K-nearest Neighbour algorithm?

A

Doesn’t require any prior training, just the storage of data
Can be used for both classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some disadvantages of K Nearest Neighbour?

A
  • Computationally Expensive - Finding distances to all training points can be slow
  • Sensitive to K and Distance
  • Curse of Dimensionality - Distance becomes less meaningful with higher dimensions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the steps involved in using a Hard Margin SVM?

A
  1. Define the Hyperplane with the equation w^Tx + b = 0, where w is the weight vector, b is the bias term, and x is the feature vector
  2. Maximise the margin, where the distance between the hyperplane and the closest data points from either class is greatest.
  3. Apply constraints for Hard Margin
  4. Minimise the norm of the weight vector, w, by maximising the margin
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of Hard Margin SVMs?

A

Theoretical Guarantee - Finds the hyperplane with the maximum margin, leading to good generalisation

Deterministic - There is always a unique solution for linearly separable data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some disadvantages of Hard Margin SVMs?

A

Assumes the data is perfectly linearly separable, otherwise the algorithm fails.

Sensitive to Outliers - A single outlier can drastically affect the hyperplane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main difference between Soft Margin SVMs and Hard Margin SVMs?

A

Soft Margin SVMs are designed to handle cases where the data is not perfectly linearly separable, whereas Hard Margin SVMs can’t handle them as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the effects of increasing the value of C within Soft Margin SVMs?

A

A higher C value will result in a larger penalisation for margin violations, thereby leading to smaller margins and fewer miscalculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a Kernel SVM differ from that of a Hard or Soft Margin SVM algorithm?

A

Kernel SVMs are able to solve problems where the data is not linearly separable in the original feature space, by mapping the data to a higher-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages of using a Soft Margin SVM over a Hard Margin SVM?

A

Soft Margin SVMs are more robust to outliers, and they can be customised more thoroughly to better cater for the problem it’s attempting to solve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the step-by-step operation of a Kernel SVM?

A
  1. Discover if the data is linearly separable or not in the standard feature space
  2. If not, then map the data to a higher dimensional space
  3. Apply the ‘Kernel Trick’, which uses a kernel function to map the data to a higher-dimensional space
  4. Solve the SVM optimisation problem by using the kernel function
  5. Create the Decision Boundary, based on the terms of the kernel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of a Kernel SVM?

A

Able to handle non-linear data
Very powerful tool for high-dimensional and non-linear datasets
Multiple kernel options allow adaptation to different types of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the disadvantages of a Kernel SVM algorithm?

A

Kernel computation can be slow for large datasets
It requires careful selection of the kernel function and its hyperparameters
Struggles with very large datasets due to quadratic complexity

17
Q

When would you use a Soft Margin SVM compared to standard Linear Regression?

A

Soft Margin SVM - Designed for Classification problems, especially when the data is not linearly separable
Linear Regression - Designed for Regression problems to pick continuous target values

18
Q

How does a Decision Tree work?

A
  1. Select the feature that best divides the dataset, where the ‘best’ feature is chosen using a splitting criterion
  2. Divide the dataset into two or more subsets based on the selected feature’s values
  3. Repeat steps 1 and 2 at each new ‘node’ that you generate
  4. Stop the process above when you meet the stopping criterion e.g. max number of depths
  5. At each node, assign either labels of the most frequent class as the predicted label (classification), or the average value of samples in the node (regression)
19
Q

What kind of Splitting Criterion is used to select the best features in a Classification Problem?

A

Gini Impurity
Entropy (Information Gain)

20
Q

What kind of Splitting Criterion is used to select the best features in a Regression problem?

A

Use Mean Squared Error (MSE)
Variance Reduction

21
Q

What are the advantages of using Decision Trees?

A
  • Robust to features with different scales - Doesn’t require feature normalisation
  • Embedded Feature Selection - Redundant features won’t be selected at decision node
  • Scalable to handle a large dataset
  • Could handle missing feature values by ignoring it
  • High interpretability
22
Q

What are the disadvantages of using Decision Trees?

A

Greedy search at each node, which is computationally expensive

Overfitting as the tree goes deeper

23
Q

What is a Random Forest?

A

A Random Forest is an ensemble learning method that combines multiple decision trees to improve performance.

24
Q

What are the advantages of using Random Forest?

A

Reduces overfitting
Handles non-linear data well
Can be used to evaluate importance of features
Left out data can be used to estimate the error without needing a separate validation set

25
Q

What are the disadvantages of using Random Forest?

A

Not easily interpretable
Computationally intensive
Making predictions requires aggregating the results of many trees, so it can be slower compared to individual models.

26
Q

What are the hyperparameters for Random Forest?

A

Number of trees in the forest
Maximum depth of each tree
Minimum number of samples required to split a node
Minimum number of samples required for a leaf node
Maximum number of features to consider when looking for the best split.
The impurity function to use