Question 1

What are the two phases in ML

Accepted Answer

Training phase and Inference phase

Question 2

What is the purpose of EDA

Accepted Answer

To maximize insight into a data set, uncover underlying structure, extract important variables, detect outliers and anomalies, test underlying assumptions, develop parsimonious models and determine optimal factor settings.

Question 3

What are the attributes related to data quality

Accepted Answer

Accuracy, completeness, consistent, and timely

Question 4

What is the difference between three data analysis approaches?

Accepted Answer

The difference is the sequence and focus of the intermediate steps.

Question 5

What are three main data analysis approaches, and explain each?

Accepted Answer

classical analysis, the data collection is followed by the imposition of a model, normality, linearity, for and the analysis, estimation and testing that follows are focused on the parameters of that model.

EDA: For EDA, the data collection is not followed by a model imposition.Rather, it is followed immediately by analysis with a goal of inferring what model would be appropriate.

Bayesian analysis, the analyst attempts to answer research questions about unknown parameters using probability statements based on prior data.

Question 6

Two types of supervised problems?

Accepted Answer

Regression and Classification

Question 7

What type of loss functions do LR and classification use usually?

Accepted Answer

regression models usually use means-squared error as their loss function whereas classification models tend to use cross entropy.

Question 8

two techniques used to prevent overfitting in a loss function

Accepted Answer

Regularization and early stopping

Question 9

what is a loss function?

Accepted Answer

It quantifies the error between your model's predictions and the true target values. By minimizing this loss function during training, you are essentially fine-tuning your model to make accurate predictions, which is the primary objective of machine learning.

Question 10

what is regularization in machine learning?

Accepted Answer

regularization is a technique used to prevent overfitting and improve the generalization ability of a model. The regularization term is typically controlled by a hyperparameter called the regularization strength, denoted as "λ" (lambda). The higher the value of λ, the stronger the regularization effect, and the more the model's parameters are constrained.

Question 11

What is Auto ML?

Accepted Answer

No data science expertise is required to use
AutoML. AutoML lets you create and train a model with minimal technical effort.
codeless
Must target one of the autoMLs predefined objectives(classification, regression)

Question 12

what is r square evaluation metric?

Accepted Answer

R squared is the square of the Pearson correlation coefficient between the observed and predicted values.
The R squared value ranges from zero to one, where a higher value indicates a higher-quality model.

Question 13

Explain AUC

Accepted Answer

PR AUC, this is the area under the precision-recall PR curve. This value ranges from zero to one, where a higher value indicates a higher-quality model.

Question 14

Explain ROC AUC

Accepted Answer

this is the area under the receiver operating characteristic ROC curve. It is plotted as TP rate vs FP rate for different thresholds.

Question 15

Explain Log loss

Accepted Answer

Log loss, this is the cross-entropy between the model predictions and the target values. This ranges from zero to infinity, where a lower value indicates a higher-quality model.

M3 Flashcards

(29 cards)