10 - AutoRecSys Flashcards
What is AutoRecSys?
Automation of Recommender Systems development pipeline from data pre-processing to model selection and post-processing of predictions
What is the goal of AutoRecSys?
The goal of AutoRecSys is to make the development of a recommender system more efficient and accessible
What is the motivation of AutoRecSys?
- Automation of tedious components
- Focus on complex development tasks rather than time-consuming tasks
- Making development of recommender systems more accessible to the general public
- Many decisions in development are arbitrary
- Promote academic integrity and research
What is AutoML?
Automated Machine Learning provides methods and processes to make Machine Learning accessible to non-experts, to increase efficiency and to increase research in Machine Learning
What is the CASH problem?
Solving the combined algorithm selection and hyperparameter optimization problem
What is the Algorithm Selection Problem?
From a set of existing algorithms, choose the algorithm that performs best for the current problem
Why is algorithm selection also called a meta-learning approach?
Algorithm selection is performed with ML methods on ML algorithms - therefore it is called a meta-learning approach
What hyperparameter optimization methods are available?
- Grid Search
- Random Search
- Bayesian Hyperparameter Optimization
How does the hyperparameter optimization Grid Search work?
- Tests all combinations of given values for different parameters
- Exhaustive Search -> Simple but inefficient
- Array of parameters may not contain good values -> search never gets a good result
How does the hyperparameter optimization Random Search?
- Tests for parameter values that are randomly generated in a given interval
- Very high probability of finding a result close to the optimal result with few iterations if the parameter intervals cover sufficient parts of the optimal space
How does the hyperparameter optimization Bayesian Hyperparameter Optimization work?
- Structured approach to optimization
- Principle of exploration versus exploitation
- Very efficient, but mostly not parallelizable
What is Cross-Validation?
- Cross-Validation is standard for machine learning assessments
- Data is grouped
- In each group, a model is trained with certain hyperparameters and tested on the respective datasets
- Average of the test errors is given as the final result
What problems does Cross-Validation have?
- The best algorithm is the one that achieves the best performance on a test set
- In cross-validation, no uniform test set, therefore evaluation of the groups on a separate test set that is not changed and is the same for everyone
What are the advantages of Bayesian Hyperparameter Optimization?
- Extremely powerful
- Works for any learning task
- Automated
What are the disadvantages of Bayesian Hyperparameter Optimization?
Takes ultra-long to evaluate on many models, especially if you can not parallelize the processes
Was ist Ensembling?
- Tool to extend or replace hyperparameter optimization
- Ensemble performance is equal to the performance of hyperparameter optimization but much faster
What is the idea behind ensembling methods?
Ensembling methods are based on the idea that the weighted average prediction of many different models beats the performance of a single (optimised) model
What are the ensembling methods?
- Bagging
- Boosting
- Stacking