Machine Learning Concepts Flashcards by John Merwin

What is a Hidden Markov Model?

It is used to model sequential data where the states are not directly observable (they are hidden), but they can be inferred only through a sequence of observations.

How well did you know this?

Not at all

Perfectly

What is emission probability?

It is the probability of emitting an observation for a given hidden state

How well did you know this?

Not at all

Perfectly

What is transition probability?

It is the probability of transitioning from one hidden state to another.

How well did you know this?

Not at all

Perfectly

What is the kernel trick?

It is mapping non-linearly separable data to a higher dimension via a kernel without actually transforming it explicitly.

How well did you know this?

Not at all

Perfectly

Why do we use the kernel trick?

To create decision boundaries for higher dimension data. Usually used with SVMs.

How well did you know this?

Not at all

Perfectly

What are some of the kernels used with the kernel trick?

Linear
Polynomial
Radial Basis Function (RBF)

How well did you know this?

Not at all

Perfectly

What is Stochastic Gradient Descent?

It is an algorithm that optimizes the parameters (weights & biases) by minimizing the cost function.

How well did you know this?

Not at all

Perfectly

What is an RNN?

It is a type of neural network designed to effectively process and analyze sequential data by maintaining a hidden state that captures information from previous time steps

How well did you know this?

Not at all

Perfectly

What are some of the variants of RNNs?

GRUs (Gated RNNs)
LSTMs (Long Short-Term Memory)

How well did you know this?

Not at all

Perfectly

What is a cost function?

It is also referred to as a loss function. It determines how well the model fits the data.

How well did you know this?

Not at all

Perfectly

Can you name some of the cost functions?

For regression problems we use MSE or RMSE and for classifier problems we use binary cross entropy as well as categorical cross entropy.

How well did you know this?

Not at all

Perfectly

What is regularization?

It is a technique used to prevent overfitting a model to the data.

How well did you know this?

Not at all

Perfectly

Can you name some regularization techniques?

Lasso or L1 regularization
Ridge or L2 regularization
Dropout regularization in neural networks.

How well did you know this?

Not at all

Perfectly

What is overfitting?

It occurs when the model fits very closely with the training data and results in near perfect predictions.

When a model is overfitted, it may not generalize well with unseen or test data.

How well did you know this?

Not at all

Perfectly

What is Back propagation?

AKA - Back propagation of errors. It is an algorithm used in ANNs to calculate gradients of the error with respect to the network’s parameters.
It adjusts the parameter values (weights and biases) so as to minimize the error.

How well did you know this?

Not at all

Perfectly

What is Principal Component Analysis (PCA)?

It is a dimensionality reduction technique used in a complex dataset while preserving its essential information.
It captures the most significant patterns in the data.

How well did you know this?

Not at all

Perfectly

What is recall?

It the ratio of correctly predicted positive instances divided by all positive instances.

Recall = TP/(TP + FN)
It is also called sensitivity

How well did you know this?

Not at all

Perfectly

What is specificity?

It is the ratio of correctly predicted negative instances divided by all negative instances.

specificity = TN/(TN + FP)

How well did you know this?

Not at all

Perfectly

What is False Positive Rate?

FPR = 1 - Specificity

It measures the rate at which negative instances are incorrectly classified as positive.

How well did you know this?

Not at all

Perfectly

What is precision?

Study These Flashcards

It the ratio of correctly predicted positive instances divided by all instances predicted as positive.

Precision = TP/(TP + FP)

What is ROC (Receiver Operating Characteristics) and its significance?

Study These Flashcards

It shows the performance of a classification model at all classification thresholds.

The ROC curve is plotted with the TPR on the y-axis and the FPR on the x-axis.

What is AUC and its significance?

Study These Flashcards

It is a commonly used metric to quantify the performance of a binary classification model.
The larger the area under the curve (AUC), the better, the model performs.

What is a CNN?

Study These Flashcards

It is a type of deep learning neural network designed specifically for processing and analyzing visual data, like images or videos.

The heart of a CNN is the filter. It is also called as a kernel. It does the work of feature detection.

What is a Confusion Matrix?

Study These Flashcards

A way to understand where the model is making mistakes and how well it’s performing.

Used to calculate, recall, precision, accuracy, F1-score.

What are the key layers in a CNN?

Convolutional layer - filters convolve x the image Pooling layer - reduces dimensions, improves computational speed Fully connected layer - involved with high-level feature extraction and outputting class scores

What are the different types of pooling in a CNN

Max pooling Average pooling

What are the different types of layers in a neural network Hint: Not input, hidden and output layers

Dense Recurrent Convoluted

What is the function of weights in a neural network?

They represent the strength of connections between neurons in different layers of the network. They capture the features and patterns in the data.

What is the function of biases in a neural network?

They provide neurons with some flexibility. With a bias, a neuron can still be activated even when its associated weight is 0

What is an activation function?

It is a function that transforms the output of a neuron and in doing so it introduces non-linearity in the model.

Can you name a few activation functions?

Sigmoid (0-1) RELU max(0, x) Leaky RELU Tanh (-1 to +1)

What is cross-validation in machine learning?

Technique used to assess the performance and generalization of a model. It divides the dataset, tests the model on various subsets, and aggregates results for a more reliable performance estimate.

What is scaling in ML?

It is the process of normalizing the feature values of the dataset so that they fall within a similar range or scale.

Why do we scale the data in ML?

Equal Weight - all features contribute equally to training the model. Interpretability Improved convergence -faster convergence especially for gradient models Regularization - applied uniformly x all features

Name some commonly used scaling classes in ML

MaxMinScaler StandardScaler

What's the difference between and MinMaxScaler and a StandardScaler?

MinMaxScaler scales features to a specific range, typically [0, 1] or another user-defined range. StandardScaler scales features to have a mean of 0 and a standard deviation of 1.

What is a tensor?

it is a mathematical object Generalizes concepts of scalars, vectors and matrices to higher dimensions

What does a Gradient Boosting Classifier do?

A GBC is a machine learning technique that builds an ensemble of decision trees sequentially, each correcting the errors of the previous ones, resulting in a highly accurate and powerful classification model. It's particularly effective for complex tasks and often outperforms single decision trees.

What does a Random Forest Classifier (RFC) do?

An ensemble machine learning model that combines multiple decision trees to make predictions, providing high accuracy and robustness for classification tasks.

What's the difference between a Decision Tree Classifier and RFC?

A DTC is a single tree-based model, while a RFC is an ensemble of multiple decision trees, which improves accuracy and reduces overfitting by combining their predictions.

What is entropy? What is the formula?

Measure of disorder or impurity in a dataset. - (P(y)*log2(P(y) - P(n)*log2 (P(n)) P(y) = prob. of 'Yes' P(n) = prob. of 'No' log is to the base 2

What is the formula for GINI entropy?

1 - (P(y)^2 + P(n)^2) P(y) = prob. of 'Yes' P(n) = prob. of 'No'

What is bias in ML?

Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model.

What are the implications of high bias?

High bias ==> model is less flexible, less complex and prone to underfitting. Does not capture the underlying data patterns

What is variance in ML?

Variance refers to the error introduced by the model's sensitivity to the specific training data (noise and fluctuations). This could be because of outliers.

What are the implications of high variance?

High variance ==> model is very flexible, complex and prone to overfitting. It captures the underlying data patterns too well.

What is Decision Stump?

It is a decision tree with a depth of 1

What is Forward Propagation

It refers to the process of computing the output of a neural network for a given input.

Machine Learning Concepts Flashcards

(49 cards)