Path3.Mod1.d - Automated Machine Learning - Prep & Run an AutoML Experiment Flashcards
During AutoML experimentation, scaling and normalization techniques are applied automatically (T/F)
True. It’s AUTO-ML. Multiple scaling and normalization techniques are applied automatically to numerica data, helping to prevent larger features from dominating training.
Once experimentation completes, only the Scaling methods used are available to review in AutoML results (T/F)
False. You can review which scaling and normalization methods were applied during experimentation.
AutoML performs Featurization by default, for which you can disable or customize further (T/F)
True
AutoML will not notify you if there are issues with data like missing values or class imbalance since it automatically applies all the transformations necessary to remediate those issues (T/F)
False. AutoML will notify you if data issues like missing values or class imbalances are detected through Data Guardrails
You can set AutoML to use Ensemble Models for training (T/F)
If Ensemble Models are enabled, AutoML will try both Voting and Stacking combinations (T/F)
True if Featurization is enabled.
False. You have to manually enable Stacking.
MVImp CatE DH-CF FE
Four optional Featurizations you can configure for preprocessing transformation
- Missing Value Imputation (replace null values in the training set)
- Categorical Encoding (categories to numeric indicators)
- Dropping High-Cardinality Features (ex. ID fields)
- Feature Engineering (ex. breaking a DateTime or TimeSpan down to its parts)
DT ERT GB KNN LGBM LR NB RF SGD XgB
Some Supported Classification Algorithms
- Decision Tree
- Extremely Randomized Trees
- Gradient Boosting
- K-Nearest Neighbors
- Light GBM
- Logistic Regression
- Naive Bayes
- Random Forest
- Stochastic Gradient Descent
- Xgboost
DT EN ERT GB KNN LGBM RF SGD XgB
Some Supported Regression Algorithms
- Decision Tree
- Elastic Net
- Extremley Randomized Trees
- Gradient Boosting
- K-Nearest Neighbors
- Light GBM
- Random Forest
- Stochastic Gradient Descent
- Xgboost
DT EN ES ERT GB KNN LGBM Na RF SA SNa TNCFo
Some Supported Time Series Forecasting Algorithms
- Decision Tree
- Elastic Net
- ExponentialSmoothing
- Extremely Randomized Trees
- Gradient Boosting
- K-Nearest Neighbors
- Light GBM
- Naive
- Random Forest
- SeasonalAverage
- SeasonalNaive
- TCNForecaster
Two reasons to restrict Algorithm selection
- Your data isn’t particularly suited for a type of algorithm
- Compliance with company policy restrictions on types of machine learning
AUCW, Acc NMR APSW PSW
The default Primary Metric and the four options available beyond the default
Default: AUCWeighted
- Accuracy
- NormMacroRecall
- AveragePrecisionScoreWeighted
- PrecisionScoreWeighted