Supervised Linear Models Flashcards

Notes from Lecture 5, which may help with the exam

1
Q

What are the main comparison points between Supervised and Unsupervised Learning?

A
  • Both require a training dataset
  • Supervised Learning requires labels for each instance/class e.g. two categories, y = 0 and y = 1
  • Unsupervised Learning does not use labels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What defines a Linear Algorithm?

A

Linear models assume that the sample features (X) and the label outputs (Y) are linearly related, described by f(x) = (W, x) + b.
It also often refers to a linear decision boundary in classification models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some examples of Linear Algorithms?

A

Linear Regression
Logistic Regression (specifically for Classification problems)
Naive Bayes
Support Vector Machines (SVM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What defines a Non-Linear Algorithm?

A

Non-linear algorithms assume a non-linear relationship between x and y. Thus, f(x) can be a function of arbitrary complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of Non-Linear Algorithms?

A

K-Nearest Neighbour
Kernel SVM
Decision Trees
Neural Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the term ‘Linearly Separable’ refer to/mean?

A
  • Datasets whose classes can be separated by linear decision surfaces
  • Implies no class overlap
  • Classes can be divided by lines for 2D data or planes in 3D data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What defines a Parametric Algorithm?

A

Parametric algorithms are model-driven algorithms that assume the data follows a specific distribution in its feature space or a pre-defined relationship between feature and outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some examples of Parametric Algorithms?

A

Linear Regression
Gaussian Naive Bayes
Maximum Likelihood Classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What defines a Non-Parametric Algorithm?

A

Non-Parametric algorithms are data-driven algorithms where approaches are not constrained to prior assumptions on the data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some examples of Non-Parametric Algorithms?

A

Decision Trees
Neural Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the primary difference between Classification and Regression?

A
  • Regression estimates values for a given dataset
  • Regression predicts numerical values
  • Classification assigns class labels to data
  • Classification predicts categorical values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the definition of Overfitting?

A

A model learns to map the training data too well, which negatively impacts the performance of the model on new, unknown data. It results in the model having poor generalisability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the definition of Underfitting?

A

If it neither can model the training nor the test data correctly. Underfitting is easier to detect than Overfitting in the training phase using evaluation metrics. It also results in poor performance metrics of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of Variance?

A

Variance is the amount that the estimate of the target function will change, given different training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of Bias?

A

Bias indicates the error between the approximated model to the ideal model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some key aspects of Intrinsic Parameters and what examples of it are there?

A
  • Can be efficiently learned on the training set
  • Large in number
  • Examples: Weights in Linear Regression or Artificial Neural Network (ANN)
17
Q

What are some key aspects of Hyper-parameters, and what examples of it are there?

A
  • Must be learned by establishing generalisation error
  • No efficient search possible
  • Smaller in number
  • Examples: The number of nodes in an ANN or the weights of two terms in a loss function
18
Q

What is the brief definition of an Intrinsic Parameter?

A

Intrinsic Parameter refers to a parameter that is inherent to the model or algorithm itself, rather than being influenced by external factors or data.

19
Q

What is the brief definition of a Hyper-parameter?

A

A Hyper-parameter refers to a parameter that is set before the learning process begins, and is not learned from the data itself.

20
Q

What are some examples of an Intrinsic Parameter?

A

Linear Regression: Weights and Bias values that define the relationship between the input features and the target variable.

Neural Networks: The Weights and biases of each layer, which are adjusted during training to minimise the error.

20
Q

What are some examples of Hyper-parameters?

A

Learning Rate: Controls how much the model’s weights are adjusted during training.

Regularisation Strength: L1, L2 regularisation, etc…

Number of Hidden Layers and Neurons Per Layer in a Neural Network.

21
Q

What does Univariate Linear Regression mean?

A

Univariate Linear Regression determines the relationship between one independent variable(feature/predictor X) and one dependent variable (outcome: y)

22
Q

How does Univariate Linear Regression work in the training process?

A

Given a model (h) with solution space (S) and a training set {X, Y} that contains (N) samples, a learning algorithm finds the solution (S’) that minimises the cost function ( J(S)).

23
Q

What does Multivariate Linear Regression mean?

A

Multivariate Linear Regression is the linear sum of the multiplications of each of the features with their corresponding weight terms: first-order polynomial, or higher-order polynomial

24
Q

What is the equation for the Sigmoid Function?

A

P(x) = 1 / (1 + e^-(w_0 + (w_1*X))

25
Q

Explain the basics of Logistic Regression

A

Logistic Regression provides probabilities and classifies new samples using continuous and discrete measurements. The output is always between 0 and 1, which indicates the likelihood of the data point being in one class or the other. The divider between the classes is a custom threshold, typically set at 0.5.

26
Q

What is the equation for the loss function called Binary Cross-Entropy Loss?

A

L = -1/n[N over sigma, j = 1 below sigma [t_j * log(p_j) + (1 - t_j) * log(1 - p_j)]]
Where:
n = Number of data points
T_j = truth value taking a value 0 or 1
p_j = Softmax probability for the jth data point

27
Q

What is Naive Bayes?

A

It is based on Bayes Theorem, where you find the probability of A happening, given that B has occurred. The assumption is that features are independent with the same importance (Naive).
Naive Bayes will lead to a linear decision boundary.

28
Q

What different variations of Naive Bayes are there?

A

Multinomial Naive Bayes - Features are counts: number of occurrences
Gaussian Naive Bayes - Features and continuous values
Bernoulli Naive Bayes - Features are only binary values: yes or no

29
Q

What is the step-by-step for performing Multinomial Naive Bayes?

A
  1. Work out the total occurrences across everything e.g. 74 Dr, 26 Lecture = 100 total
  2. For each different ‘variable’, calculate the probability by dividing the amount by the total in step 1 e.g. 26/100 = 0.26
  3. Work out the probability of each class e.g. Spam = 70 total, not spam = 100 total, then probability of spam = 70/(70 + 100) = 0.412
  4. Take your key ‘variables’ e.g. lecture, money, and multiply their respective probabilities (found in step 2) with the overall probability of the class (found in step 3) for each class.
  5. Determine what class it is by taking the highest value found.
30
Q

What is Gaussian Naive Bayes?

A

Gaussian Naive Bayes is a classification algorithm that uses the Naive Bayes theorem, but assumes that the features follow a Gaussian Distribution.
It is primarily used when your features are continuous and can be modelled as a bell-shaped curve.