General ML Flashcards
Understand General ML Algorithms. To a proficient level.
What are supervised ML algorithms?
Supervised ML algorithms are ML algorithms that attempt to predict uncertainty through learning with aid, i.e. a dataset.
Learning through labeled data, i.e. labeled data with pre defined inputs and outputs, with the idea of predicting outputs for new unseen inputs.
What are some examples of Supervised ML algorithms?
Anything that learns and trains from a dataset. So regression algorithms such as Linear and Logistic Regression, as well as traditional ML algorithms such as Decision Trees and Random Forests
What is Linear Regression?
Linear regression is a supervised ML algorithm that takes in labeled data (1D or multi-dimensional) and tries to build a “best fitting line” to describe the relationship between input and output variables.
What is the objective and equation of a linear regression algorithm? What does each variable mean?
The objective it to build a best fitting linear equation to describe the relationship between the numerical input and output variables.
There can be up to n input variables, and one output variables.
The equation is quite simple:
B0 + B1*x = y
Where
B0 is the y intercept
B1 is the slope
x is the manipulated variable (independent variable)
y is your target variable ( dependent variable)
Main objective in other words, it to find best B0 and best B1 that fits the data.
What is Multiple Linear regression?
When there is multiple(n) feature variables to one target.
y = b0 + b1x_1 + b2x_2 + b3x_3 + …. bnx_n
How do we find the best fitting line? Mathematically?
You find it by minimizing the residual sum of squares.
The best fitting line, minimizes the RSS. So as in mathematical terms, we minimize the MSE. So our cost function would be defined as:
MSE = (1/n) * ∑ (yi -ŷi)^2
What is Gradient Descent?
It is an optimization algorithm used in ML to minimize the loss function by finding the best model parameters.
The algorithm iteratively adjusts the models parameters to find the best possible values that minimize the loss.
Explain the process of gradient descent.
In the context of 1d LR, lets assume there is only 2 parameters that need to be predicted: b_0 and b_1
- First initialize params (b_0 and b_1) as 0 or random
- Compute the target predictions using your models objective function (y = b_0 + b_1*x)
- Compute the loss
a. For each predicted data point, calculate the MSE
b. MSE = (1/n) * ∑ (yi -ŷi)^2 - Compute the Gradients
a. Say calculated partial derivatives for b_0 and b_1 is W and Z respectively - update params
b_0 = b_0 - aW
b_1 = b_1 - aZ
a –> step size - iterate until loss is small, or max iterations is reached
What is Logistic Regression?
Supervised ML algorithm that is used for binary outputs. Where the goal is to pick one of two possible outcomes.
Takes input features and combines them into a linear equation (same as LR) however, instead of it predicting the value of the target, it predicts the probability of the target through mapping the output of the linear equation to a sigmoid function, which maps it to a probability in between 0 and 1
Decision boundaries choose where to map the outputs based on the probability.
i.e prob <= 0.5 ==> 0 (false/no)
porb > 0.5 ==> 1 (true/yes)
What loss functions does Logistic Regression use?
It uses cross entropy loss
The reasons for this, is that cross entropy measures the distance from predicted probability distribution to the actual distribution. I.e quantifies the how well predicted probabilities match true labels
Gradients are also more effective to be used from cross entropy, as something like MSE would produce gradients that are less sensitive to the differences between predicted probabilities and true labels.
What are decision trees?
Decision Trees are a type of ML model that uses a tree-like structure to make predictions based on input features.
Each node represents a feature or attribute, each branch represents a possible value of that feature
Algorithm Works recursively, splitting data into subsets based on the input features. Splitting criteria is continuous until a stopping criteria (max depth, min instances)
At each node, algorithm chooses feature that best separates data. Based on Gini impurity or information gain.
Once tree is built, it predicts the class of new instances by traversing the tree.
What are random forests?
Definition: A random forest is an ensemble learning method that combines multiple decision trees to improve accuracy and reduce overfitting in both classification and regression tasks.
How It Works:
Bagging: Multiple decision trees are trained on different random subsets of the training data.
Random Feature Selection: Each tree considers a random subset of features when making splits, ensuring diversity among the trees.
Voting/Averaging: For classification, the forest’s final prediction is the majority vote of all trees; for regression, it’s the average of all tree predictions.
How does the random forest classifier work?
Select a random sample of data from the dataset (bootstrap sample). It is used to train a single decision tree.
Randomly select a subset of features from the dataset. The number of features to select is a hyperparameter that is set by user.
Use selected features to train the decision tree on bootstrap sample.
Repeat steps 1-3, each on a different bootstrap and feature subset.
To obtain a prediction, run data point through all trained trees and take majority vote. For regression, take average.
What is Unsupervised Machine Learning
No labels on the data. The model must learn based on input data only, with no target information.
What are some examples of unsupervised ML algorithms?
K-means Clustering and PCA (Principal Components Analysis)