General ML Flashcards

Understand General ML Algorithms. To a proficient level.

1
Q

What are supervised ML algorithms?

A

Supervised ML algorithms are ML algorithms that attempt to predict uncertainty through learning with aid, i.e. a dataset.
Learning through labeled data, i.e. labeled data with pre defined inputs and outputs, with the idea of predicting outputs for new unseen inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some examples of Supervised ML algorithms?

A

Anything that learns and trains from a dataset. So regression algorithms such as Linear and Logistic Regression, as well as traditional ML algorithms such as Decision Trees and Random Forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Linear Regression?

A

Linear regression is a supervised ML algorithm that takes in labeled data (1D or multi-dimensional) and tries to build a “best fitting line” to describe the relationship between input and output variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the objective and equation of a linear regression algorithm? What does each variable mean?

A

The objective it to build a best fitting linear equation to describe the relationship between the numerical input and output variables.

There can be up to n input variables, and one output variables.

The equation is quite simple:
B0 + B1*x = y

Where
B0 is the y intercept
B1 is the slope
x is the manipulated variable (independent variable)
y is your target variable ( dependent variable)

Main objective in other words, it to find best B0 and best B1 that fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Multiple Linear regression?

A

When there is multiple(n) feature variables to one target.

y = b0 + b1x_1 + b2x_2 + b3x_3 + …. bnx_n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we find the best fitting line? Mathematically?

A

You find it by minimizing the residual sum of squares.

The best fitting line, minimizes the RSS. So as in mathematical terms, we minimize the MSE. So our cost function would be defined as:
MSE = (1/n) * ∑ (yi -ŷi)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Gradient Descent?

A

It is an optimization algorithm used in ML to minimize the loss function by finding the best model parameters.

The algorithm iteratively adjusts the models parameters to find the best possible values that minimize the loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the process of gradient descent.

A

In the context of 1d LR, lets assume there is only 2 parameters that need to be predicted: b_0 and b_1

  1. First initialize params (b_0 and b_1) as 0 or random
  2. Compute the target predictions using your models objective function (y = b_0 + b_1*x)
  3. Compute the loss
    a. For each predicted data point, calculate the MSE
    b. MSE = (1/n) * ∑ (yi -ŷi)^2
  4. Compute the Gradients
    a. Say calculated partial derivatives for b_0 and b_1 is W and Z respectively
  5. update params
    b_0 = b_0 - aW
    b_1 = b_1 - a
    Z
    a –> step size
  6. iterate until loss is small, or max iterations is reached
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Logistic Regression?

A

Supervised ML algorithm that is used for binary outputs. Where the goal is to pick one of two possible outcomes.

Takes input features and combines them into a linear equation (same as LR) however, instead of it predicting the value of the target, it predicts the probability of the target through mapping the output of the linear equation to a sigmoid function, which maps it to a probability in between 0 and 1

Decision boundaries choose where to map the outputs based on the probability.

i.e prob <= 0.5 ==> 0 (false/no)
porb > 0.5 ==> 1 (true/yes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What loss functions does Logistic Regression use?

A

It uses cross entropy loss

The reasons for this, is that cross entropy measures the distance from predicted probability distribution to the actual distribution. I.e quantifies the how well predicted probabilities match true labels

Gradients are also more effective to be used from cross entropy, as something like MSE would produce gradients that are less sensitive to the differences between predicted probabilities and true labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are decision trees?

A

Decision Trees are a type of ML model that uses a tree-like structure to make predictions based on input features.

Each node represents a feature or attribute, each branch represents a possible value of that feature

Algorithm Works recursively, splitting data into subsets based on the input features. Splitting criteria is continuous until a stopping criteria (max depth, min instances)
At each node, algorithm chooses feature that best separates data. Based on Gini impurity or information gain.

Once tree is built, it predicts the class of new instances by traversing the tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are random forests?

A

Definition: A random forest is an ensemble learning method that combines multiple decision trees to improve accuracy and reduce overfitting in both classification and regression tasks.

How It Works:
Bagging: Multiple decision trees are trained on different random subsets of the training data.
Random Feature Selection: Each tree considers a random subset of features when making splits, ensuring diversity among the trees.
Voting/Averaging: For classification, the forest’s final prediction is the majority vote of all trees; for regression, it’s the average of all tree predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the random forest classifier work?

A

Select a random sample of data from the dataset (bootstrap sample). It is used to train a single decision tree.

Randomly select a subset of features from the dataset. The number of features to select is a hyperparameter that is set by user.

Use selected features to train the decision tree on bootstrap sample.

Repeat steps 1-3, each on a different bootstrap and feature subset.

To obtain a prediction, run data point through all trained trees and take majority vote. For regression, take average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Unsupervised Machine Learning

A

No labels on the data. The model must learn based on input data only, with no target information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some examples of unsupervised ML algorithms?

A

K-means Clustering and PCA (Principal Components Analysis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the K-means Clustering Algorithm

A

Classification algorithm, used to group data into k clusters based on their similarity.

You start by selecting K random centroids as the initial cluster centers

Assignment Step:
For each data point, assign it to the nearest centroid based on the Euclidean distance between the data point and the centroid.
Update Step:
Compute the mean of all data points assigned to each cluster. This mean becomes the new centroid for that cluster.

Convergence Check:

Repeat steps 2 and 3 until the centroids no longer change (i.e., they converge) or a predefined maximum number of iterations is reached.
These steps summarize the K-means clustering algorithm.

17
Q

What is Principal Components Analysis

A

Essentially combines features to decrease dimensionality, produces new features that are correlated, and ordered from least important to most important.

18
Q

Explain K-Fold Cross Validation

A

Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample.

For k-fold cross validation, the general procedure is as follows:

Shuffle dataset randomly
Split the dataset into k groups
For each unique group
Take the group as a hold out or test set
Take the remaining groups as the training set
Fit a model on the training set and evaluate it on the test set
Retain the evaluation score and discard the model
Summarize the skill of the model using the sample of model evaluation scores

19
Q

Explain Hyperparameter tuning

A

The problem of choosing a set of optimal hyperparameters for a learning algorithm

Hyperparameters → parameters that are not learned during training, but need to be set beforehand

Ex. training a NN

Number of layers, number of neurons, learning rate, etc

20
Q

What are two methods of hyperparameter tuning?

A

Grid Search: Tries out all possible combinations of hyperparameters in a predefined grid.

Guaranteed to find the best combination
Computationally expensive

Random search: Randomly samples combinations of hyperparameters from the hyperparameter space

Less computationally expensive because it explores a random subset of the hyperparameter space
Performs surprisingly well
Finds good solutions, but no guarantee optimal