Training ML Flashcards
Custom Training: Can manually optimize model performance with hyperparameter tuning?
Yes. You can tune the model during each training run for experimentation and comparison.
AutoML: Data science expertise needed?
No.
AutoML: Time to trained model?
Lower. Less data preparation is required, and no development is needed.
AutoML: Limits on machine learning objectives?
Yes, you must target one of AutoML’s predefined objectives.
BigQueryML: Can manually optimize model performance with hyperparameter tuning?
Yes. BigQuery ML supports hyperparameter tuning when training ML models using CREATE MODEL
statements.
AutoML: Limits on data size?
Yes. AutoML uses managed datasets; data size limitations vary depending on the type of dataset.
What are the possible solutions to working with a sparse dataset?
- Use a model that supports training with sparse datasets (e.g. Wide&Deep, Autoencoder) 2. Remove features with low variance (e.g. use Lasso or L1 regularization) 3. Use a Dimensionality reduction method (e.g. PCA) to make sparse datasets dense, extracting principal components with the most variance.
Custom Training: Limits on data size?
For unmanaged datasets, no. Managed datasets have the same limits as managed dataset objects created in and hosted by Vertex AI and are used to train AutoML models.
Custom Training: Programming ability needed?
Yes, to develop the training application.
BigQueryML: Data science expertise needed?
No.
BigQueryML: Limits on data size?
Yes. BigQuery ML enforces appropriate quotas on a per-project basis.
What are the challenges with sparse datasets?
Overfitting, High memory usage, Computational Complexity, Inaccurate results.
BigQueryML: Programming ability needed?
SQL programming ability required to build, evaluate, and use the model in BigQuery ML.
AutoML: Can manually optimize model performance with hyperparameter tuning?
No. AutoML does some automated hyperparameter tuning, but you can’t modify the values used.
Custom Training: Can control aspects of the training environment?
Yes. You can specify aspects of the environment such as Compute Engine machine type, disk size, machine learning framework, and number of nodes.