Training ML Flashcards
Custom Training: Can manually optimize model performance with hyperparameter tuning?
Yes. You can tune the model during each training run for experimentation and comparison.
AutoML: Data science expertise needed?
No.
AutoML: Time to trained model?
Lower. Less data preparation is required, and no development is needed.
AutoML: Limits on machine learning objectives?
Yes, you must target one of AutoML’s predefined objectives.
BigQueryML: Can manually optimize model performance with hyperparameter tuning?
Yes. BigQuery ML supports hyperparameter tuning when training ML models using CREATE MODEL
statements.
AutoML: Limits on data size?
Yes. AutoML uses managed datasets; data size limitations vary depending on the type of dataset.
What are the possible solutions to working with a sparse dataset?
- Use a model that supports training with sparse datasets (e.g. Wide&Deep, Autoencoder) 2. Remove features with low variance (e.g. use Lasso or L1 regularization) 3. Use a Dimensionality reduction method (e.g. PCA) to make sparse datasets dense, extracting principal components with the most variance.
Custom Training: Limits on data size?
For unmanaged datasets, no. Managed datasets have the same limits as managed dataset objects created in and hosted by Vertex AI and are used to train AutoML models.
Custom Training: Programming ability needed?
Yes, to develop the training application.
BigQueryML: Data science expertise needed?
No.
BigQueryML: Limits on data size?
Yes. BigQuery ML enforces appropriate quotas on a per-project basis.
What are the challenges with sparse datasets?
Overfitting, High memory usage, Computational Complexity, Inaccurate results.
BigQueryML: Programming ability needed?
SQL programming ability required to build, evaluate, and use the model in BigQuery ML.
AutoML: Can manually optimize model performance with hyperparameter tuning?
No. AutoML does some automated hyperparameter tuning, but you can’t modify the values used.
Custom Training: Can control aspects of the training environment?
Yes. You can specify aspects of the environment such as Compute Engine machine type, disk size, machine learning framework, and number of nodes.
BigQueryML: Limits on machine learning objectives?
Yes.
Custom Training: Limits on machine learning objectives?
No.
BigQueryML: Time to trained model?
Lower. Model development speed is increased since you don’t need to build the infrastructure required for batch predictions or model training, as BigQuery ML leverages the BigQuery computational engine.
AutoML: Programming ability needed?
No, AutoML is codeless.
Custom Training: Data science expertise needed?
Yes, to develop the training application and also to do some of the data preparation like feature engineering.
What is an example of a sparse dataset?
Netflix recommendation system. One-hot encoded categorical dataset for ‘Have you watched this movie before?’ out of 100 movies.
Think why this is sparse.
BigQueryML: Can control aspects of the training environment?
No.
AutoML: Can control aspects of the training environment?
Limited. For image and tabular datasets, you can specify the number of node hours to train for, and whether to allow early stopping of training.
Custom Training: Time to trained model?
Higher. More data preparation is required, and training application development is needed.