Machine Learning Concepts Flashcards
What is a Hidden Markov Model?
It is used to model sequential data where the states are not directly observable (they are hidden), but they can be inferred only through a sequence of observations.
What is emission probability?
It is the probability of emitting an observation for a given hidden state
What is transition probability?
It is the probability of transitioning from one hidden state to another.
What is the kernel trick?
It is mapping non-linearly separable data to a higher dimension via a kernel without actually transforming it explicitly.
Why do we use the kernel trick?
To create decision boundaries for higher dimension data. Usually used with SVMs.
What are some of the kernels used with the kernel trick?
Linear
Polynomial
Radial Basis Function (RBF)
What is Stochastic Gradient Descent?
It is an algorithm that optimizes the parameters (weights & biases) by minimizing the cost function.
What is an RNN?
It is a type of neural network designed to effectively process and analyze sequential data by maintaining a hidden state that captures information from previous time steps
What are some of the variants of RNNs?
GRUs (Gated RNNs)
LSTMs (Long Short-Term Memory)
What is a cost function?
It is also referred to as a loss function. It determines how well the model fits the data.
Can you name some of the cost functions?
For regression problems we use MSE or RMSE and for classifier problems we use binary cross entropy as well as categorical cross entropy.
What is regularization?
It is a technique used to prevent overfitting a model to the data.
Can you name some regularization techniques?
Lasso or L1 regularization
Ridge or L2 regularization
Dropout regularization in neural networks.
What is overfitting?
It occurs when the model fits very closely with the training data and results in near perfect predictions.
When a model is overfitted, it may not generalize well with unseen or test data.
What is Back propagation?
AKA - Back propagation of errors. It is an algorithm used in ANNs to calculate gradients of the error with respect to the network’s parameters.
It adjusts the parameter values (weights and biases) so as to minimize the error.
What is Principal Component Analysis (PCA)?
It is a dimensionality reduction technique used in a complex dataset while preserving its essential information.
It captures the most significant patterns in the data.
What is recall?
It the ratio of correctly predicted positive instances divided by all positive instances.
Recall = TP/(TP + FN)
It is also called sensitivity
What is specificity?
It is the ratio of correctly predicted negative instances divided by all negative instances.
specificity = TN/(TN + FP)
What is False Positive Rate?
FPR = 1 - Specificity
It measures the rate at which negative instances are incorrectly classified as positive.