Classification Flashcards
Learn the general concept of classification.
What should we do when the dataset is overfitting in cross-validation?
That means regularization would be necessary.Either that means Lasso / Ridge regularization, or it just means getting more data.
Is it possible to predict more than two categories?
Yes, a classification task with more than two classes is called Multi-class classification.
What is the definition of Metric?
They’re often differentiable in the model’s parameters and are used to train a machine learning model (using some type of optimization like Gradient Descent).Metrics are used to track and measure a model’s performance (during training and testing), and they don’t have to be unique.
How do we deal with missing data?
There are various ways to handle the missing values.We can use the mean or median if it is a numerical column and mode if it is a categorical column.There are fancier methods like Iterative Imputers as well.
How do you deal with unbalanced data in classification problems?
There are ways to handle the imbalance in the data.We can use resampling methods like oversampling or undersampling.We can also try using methods like SMOTE or Adasyn.
Does categorical variable need normalization standardization?
In general, a categorical variable will never have a normal distribution.Simply code dichotomous variables as 0,1 (or 1,2).So, there’s no need for standardizing as it wouldn’t make much sense.
How to figure out the optimal threshold in the linear classifier?
The default threshold is 0.5, however, depending on the problem at hand, we can adjust it.For example, if correctly identifying one variable is more important than correctly identifying the other variable, or if two classes are unbalanced, we can adjust the threshold to meet the needs.When changing the threshold, there is a compromise between precision and recall.The precision-recall curve can be used to determine the appropriate threshold.
What type of variable is ordinal?
An ordinal variable is a categorical variable with an ordered set of possible values.Ordinal variables are a type of variable that falls somewhere between categorical and quantitative variables.
How do you adjust the threshold to reach the appropriate sensitivity if there are more than two categories?
We can employ a one-versus-all strategy.By dividing the multi-class dataset into a set of binary classification problems.
What is gamma in machine learning?
Gamma is a hyperparameter that must be specified before training the model.Gamma determines the amount of curvature in a decision boundary.More curvature suggests a higher gamma.Low gamma indicates less curvature.
Is the covariance matrix symmetric?
Any covariance matrix is symmetric and positive semi-definite, with variances in the major diagonal (i.E., the covariance of each element with itself).To completely characterize the two-dimensional variation, a matrix would be required.
What is a maximum posterior hypothesis?
A Bayesian-based strategy to generate a distribution and model parameters that best describe an observed dataset is known as Maximum a Posteriori (MAP).Calculating a conditional probability of witnessing data given a model weighted by a previous probability or belief about the model is what MAP is all about.
What is the difference between the false positive and the false negative?
When a researcher determines something is true when it is wrong, this is known as a false positive (also called a type I error).A “false alarm” is a term for a false positive.When you say something is false when it is true, you are using a false negative (also called a type II error).
What are one-vs-Rest and one vs all?
One-vs-rest (OvR for short, sometimes known as One-vs-All or OvA) is a heuristic technique for multi-class classification utilizing binary classification methods.On each binary classification task, a binary classifier is trained, and predictions are made using the most confident model.
What about the non-diagonal terms of the covariance?
In the covariance table, the off-diagonal values are different from zero.This indicates the presence of redundancy in the data.In other words, there is a certain amount of correlation between variables.This kind of matrix, with non-zero off-diagonal values, is called a “non-diagonal” matrix.
Not entirely understand why 3.33% is the misclassification rate?
If you were to simply predict “no default” on this dataset, since only 3.33% are the bad population, the error would be only 3.33%.
Is there a method to fine-tuning?
We can experiment with different threshold values to see which one best separates the data.It varies from case to case.Precision and recall have an inverse relationship.
What is AUC-ROC?
AUC represents the degree or measure of separability, whereas ROC is a probability curve.It indicates how well the model can distinguish between classes.The better the model predicts 0 classes as 0 and 1 classes as 1, the higher the AUC.
How do you ensure that you are not overfitting a model?
Keep the model simpler: remove some of the noise in the training data.By using cross-validation techniques such as k-folds cross-validation.By using regularization techniques such as LASSO.
Is it possible to use PCA?
We can use PCA.But we will lose all the interpretability of the variables.
What are Type I error and type II errors? When is a Type I error committed and how might you avoid committing a Type I error?
If your statistical test was significant, you would have then committed a Type I error, as the null hypothesis is actually true.In other words, you found a significant result merely due to chance.The flipside of this issue is committing a Type II error: failing to reject a false null hypothesis.
How do you verify causation?
The best technique to find causal correlations is to use randomized experiments.You can test for causation once you’ve found a correlation by doing experiments in which you “control the other variables and measure the difference.We can apply the following analysis to determine causation with your product: Hypothesis testing.A/B/n experiments.
What situation do you think where bootstrapping is not applicable?
There are several, mostly esoteric, conditions when bootstrapping is not appropriate, such as when the population variance is infinite, or when the population values are discontinuous at the median.And, there are various conditions where tweaks to the bootstrapping process are necessary to adjust for bias.
How do you deal with high imbalanced data?
Approaches to deal with the imbalanced dataset problem are choosing proper evaluation metrics.The accuracy of a classifier is the total number of correct predictions by the classifier divided by the total number of predictions, Resampling (Oversampling and Undersampling).SMOTE, BalancedBaggingClassifier, Threshold moving.
Is clustering an unsupervised learning method?
Clustering is an unsupervised method that works on datasets where neither the outcome (target) variable nor the relationship between the observations is known, i.E.Unlabeled data.
What is regularization?
Regularization is a technique used to reduce errors by fitting the function on the given training set to avoid overfitting.The commonly used regularization techniques are L1 regularization.L2 regularization.
What is the best technique for dealing with heavily imbalanced datasets?
The resampling Technique is a widely adopted technique for dealing with highly unbalanced datasets.It consists of removing samples from the majority class (under-sampling) and/or adding more examples from the minority class (over-sampling).
How can we use statistical Imputation for Missing Values in Machine Learning?
Using statistical methods to estimate a value for a column from the values that are there, then replacing all missing values in the column with the estimated statistic, is a straightforward and popular way to data imputation.
Does adding more features prevent overfitting?
The addition of numerous new characteristics to the model aids in the prevention of overfitting on the training set.Adding additional features allows us to create more expressive models that are better suited to our training data.If too many new features are added, the training set may become overfitted.