M3 Flashcards
What are the two phases in ML
Training phase and Inference phase
What is the purpose of EDA
To maximize insight into a data set, uncover underlying structure, extract important variables, detect outliers and anomalies, test underlying assumptions, develop parsimonious models and determine optimal factor settings.
What are the attributes related to data quality
Accuracy, completeness, consistent, and timely
What is the difference between three data analysis approaches?
The difference is the sequence and focus of the intermediate steps.
What are three main data analysis approaches, and explain each?
classical analysis, the data collection is followed by the imposition of a model, normality, linearity, for and the analysis, estimation and testing that follows are focused on the parameters of that model.
EDA: For EDA, the data collection is not followed by a model imposition.Rather, it is followed immediately by analysis with a goal of inferring what model would be appropriate.
Bayesian analysis, the analyst attempts to answer research questions about unknown parameters using probability statements based on prior data.
Two types of supervised problems?
Regression and Classification
What type of loss functions do LR and classification use usually?
regression models usually use means-squared error as their loss function whereas classification models tend to use cross entropy.
two techniques used to prevent overfitting in a loss function
Regularization and early stopping
what is a loss function?
It quantifies the error between your model’s predictions and the true target values. By minimizing this loss function during training, you are essentially fine-tuning your model to make accurate predictions, which is the primary objective of machine learning.
what is regularization in machine learning?
regularization is a technique used to prevent overfitting and improve the generalization ability of a model. The regularization term is typically controlled by a hyperparameter called the regularization strength, denoted as “λ” (lambda). The higher the value of λ, the stronger the regularization effect, and the more the model’s parameters are constrained.
What is Auto ML?
No data science expertise is required to use
AutoML. AutoML lets you create and train a model with minimal technical effort.
codeless
Must target one of the autoMLs predefined objectives(classification, regression)
what is r square evaluation metric?
R squared is the square of the Pearson correlation coefficient between the observed and predicted values.
The R squared value ranges from zero to one, where a higher value indicates a higher-quality model.
Explain AUC
PR AUC, this is the area under the precision-recall PR curve. This value ranges from zero to one, where a higher value indicates a higher-quality model.
Explain ROC AUC
this is the area under the receiver operating characteristic ROC curve. It is plotted as TP rate vs FP rate for different thresholds.
Explain Log loss
Log loss, this is the cross-entropy between the model predictions and the target values.
This ranges from zero to infinity, where a lower value indicates a higher-quality model.
Three steps in making a recommendation system with BQ ML
prepare training data in BigQuery, train a recommendation system with BigQuery ML, and use the predicted recommendations in production.
Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?
DNN Classifier (TensorFlow)
AutoML Tables
Logistic Regression
XGBoost
Logistic Regression
what is a scaling parameter used to set the step size of the gradient descent called?
learning_rate
Why retraining the model could output different results even if all the setting are the same?
The loss surface could have more than 1 minima (a non convex surface, without a global minima)
What are the three steps of a training loop?
Calculate derivative, take a step, check loss.
How can you speed up training time of a model?
- Number of data points that we collect the derivative on (the derivative comes from our loss function, and our loss function composes the error of a number of predictions together.So this method essentially reduces the number of data points that we feed into our loss function at each iteration of our algorithm.
The reason that this might still work is that it’s possible to extract samples from our training data that on average balance each other out.sampling strategy selects from our training set with uniform probablity. This is called mini-batch gradient descent) - frequency with which we check the loss.
What do they mean by batch-size
the size of samples in mini batch gradient descent
What are the two consequences of inappropriate minima
Doesn’t reflect the relationship between features and label
won’t generalize well
Difference between loss function and performance metrics
Loss function is during the training, whereas PM is after
LF is harder to understand
PM is directly connected to business goals
What is a type 1 error
FP
What is a Type 2 error
FN
Why can’t we use the validation set only to report the model performace
Because you used the validation data set to choose when to stop the training, so now it is no longer independent.
What is bootstrapping or cross-validation?