Supervised Linear Models Flashcards by Joshua Carey-Young

What are the main comparison points between Supervised and Unsupervised Learning?

Both require a training dataset
Supervised Learning requires labels for each instance/class e.g. two categories, y = 0 and y = 1
Unsupervised Learning does not use labels

How well did you know this?

Not at all

Perfectly

What defines a Linear Algorithm?

Linear models assume that the sample features (X) and the label outputs (Y) are linearly related, described by f(x) = (W, x) + b.
It also often refers to a linear decision boundary in classification models.

How well did you know this?

Not at all

Perfectly

What are some examples of Linear Algorithms?

Linear Regression
Logistic Regression (specifically for Classification problems)
Naive Bayes
Support Vector Machines (SVM)

How well did you know this?

Not at all

Perfectly

What defines a Non-Linear Algorithm?

Non-linear algorithms assume a non-linear relationship between x and y. Thus, f(x) can be a function of arbitrary complexity

How well did you know this?

Not at all

Perfectly

What are some examples of Non-Linear Algorithms?

K-Nearest Neighbour
Kernel SVM
Decision Trees
Neural Networks

How well did you know this?

Not at all

Perfectly

What does the term ‘Linearly Separable’ refer to/mean?

Datasets whose classes can be separated by linear decision surfaces
Implies no class overlap
Classes can be divided by lines for 2D data or planes in 3D data

How well did you know this?

Not at all

Perfectly

What defines a Parametric Algorithm?

Parametric algorithms are model-driven algorithms that assume the data follows a specific distribution in its feature space or a pre-defined relationship between feature and outcome

How well did you know this?

Not at all

Perfectly

What are some examples of Parametric Algorithms?

Linear Regression
Gaussian Naive Bayes
Maximum Likelihood Classifier

How well did you know this?

Not at all

Perfectly

What defines a Non-Parametric Algorithm?

Non-Parametric algorithms are data-driven algorithms where approaches are not constrained to prior assumptions on the data distribution.

How well did you know this?

Not at all

Perfectly

What are some examples of Non-Parametric Algorithms?

Decision Trees
Neural Networks

How well did you know this?

Not at all

Perfectly

What is the primary difference between Classification and Regression?

Regression estimates values for a given dataset
Regression predicts numerical values
Classification assigns class labels to data
Classification predicts categorical values

How well did you know this?

Not at all

Perfectly

What is the definition of Overfitting?

A model learns to map the training data too well, which negatively impacts the performance of the model on new, unknown data. It results in the model having poor generalisability

How well did you know this?

Not at all

Perfectly

What is the definition of Underfitting?

If it neither can model the training nor the test data correctly. Underfitting is easier to detect than Overfitting in the training phase using evaluation metrics. It also results in poor performance metrics of the model.

How well did you know this?

Not at all

Perfectly

What is the definition of Variance?

Variance is the amount that the estimate of the target function will change, given different training data.

How well did you know this?

Not at all

Perfectly

What is the definition of Bias?

Bias indicates the error between the approximated model to the ideal model

How well did you know this?

Not at all

Perfectly

What are some key aspects of Intrinsic Parameters and what examples of it are there?

Study These Flashcards

Can be efficiently learned on the training set
Large in number
Examples: Weights in Linear Regression or Artificial Neural Network (ANN)

What are some key aspects of Hyper-parameters, and what examples of it are there?

Study These Flashcards

Must be learned by establishing generalisation error
No efficient search possible
Smaller in number
Examples: The number of nodes in an ANN or the weights of two terms in a loss function

What is the brief definition of an Intrinsic Parameter?

Study These Flashcards

Intrinsic Parameter refers to a parameter that is inherent to the model or algorithm itself, rather than being influenced by external factors or data.

What is the brief definition of a Hyper-parameter?

Study These Flashcards

A Hyper-parameter refers to a parameter that is set before the learning process begins, and is not learned from the data itself.

What are some examples of an Intrinsic Parameter?

Study These Flashcards

Linear Regression: Weights and Bias values that define the relationship between the input features and the target variable.

Neural Networks: The Weights and biases of each layer, which are adjusted during training to minimise the error.

What are some examples of Hyper-parameters?

Study These Flashcards

Learning Rate: Controls how much the model’s weights are adjusted during training.

Regularisation Strength: L1, L2 regularisation, etc…

Number of Hidden Layers and Neurons Per Layer in a Neural Network.

What does Univariate Linear Regression mean?

Study These Flashcards

Univariate Linear Regression determines the relationship between one independent variable(feature/predictor X) and one dependent variable (outcome: y)

How does Univariate Linear Regression work in the training process?

Study These Flashcards

Given a model (h) with solution space (S) and a training set {X, Y} that contains (N) samples, a learning algorithm finds the solution (S’) that minimises the cost function ( J(S)).

What does Multivariate Linear Regression mean?

Study These Flashcards

Multivariate Linear Regression is the linear sum of the multiplications of each of the features with their corresponding weight terms: first-order polynomial, or higher-order polynomial

What is the equation for the Sigmoid Function?

P(x) = 1 / (1 + e^-(w_0 + (w_1*X))

Explain the basics of Logistic Regression

Logistic Regression provides probabilities and classifies new samples using continuous and discrete measurements. The output is always between 0 and 1, which indicates the likelihood of the data point being in one class or the other. The divider between the classes is a custom threshold, typically set at 0.5.

What is the equation for the loss function called Binary Cross-Entropy Loss?

L = -1/n[N over sigma, j = 1 below sigma [t_j * log(p_j) + (1 - t_j) * log(1 - p_j)]] Where: n = Number of data points T_j = truth value taking a value 0 or 1 p_j = Softmax probability for the jth data point

What is Naive Bayes?

It is based on Bayes Theorem, where you find the probability of A happening, given that B has occurred. The assumption is that features are independent with the same importance (Naive). Naive Bayes will lead to a linear decision boundary.

What different variations of Naive Bayes are there?

Multinomial Naive Bayes - Features are counts: number of occurrences Gaussian Naive Bayes - Features and continuous values Bernoulli Naive Bayes - Features are only binary values: yes or no

What is the step-by-step for performing Multinomial Naive Bayes?

1. Work out the total occurrences across everything e.g. 74 Dr, 26 Lecture = 100 total 2. For each different 'variable', calculate the probability by dividing the amount by the total in step 1 e.g. 26/100 = 0.26 3. Work out the probability of each class e.g. Spam = 70 total, not spam = 100 total, then probability of spam = 70/(70 + 100) = 0.412 4. Take your key 'variables' e.g. lecture, money, and multiply their respective probabilities (found in step 2) with the overall probability of the class (found in step 3) for each class. 5. Determine what class it is by taking the highest value found.

What is Gaussian Naive Bayes?

Gaussian Naive Bayes is a classification algorithm that uses the Naive Bayes theorem, but assumes that the features follow a Gaussian Distribution. It is primarily used when your features are continuous and can be modelled as a bell-shaped curve.

Supervised Linear Models Flashcards

Notes from Lecture 5, which may help with the exam (31 cards)