Machine Learning Lifecycle Flashcards
Problem Definition
This is the first stage where the specific problem to be solved with machine learning is identified and clearly defined. This could be a prediction task, a classification task, anomaly detection, etc.
Data Collection
In this stage, the necessary data for training the model is collected. This can involve various sources such as databases, text files, APIs, or even web scraping.
Data Preprocessing
The collected data needs to be cleaned and transformed into a format suitable for machine learning. This can involve handling missing values, outliers, and errors, as well as normalizing and scaling data, and dealing with categorical variables.
Feature Engineering
This stage involves the creation of new features from existing ones, or the selection of the most relevant features for the ML task at hand. This can improve model performance and efficiency.
Model Training
Here, a machine learning model is chosen and trained on the preprocessed data. This involves using a suitable algorithm and learning method (like supervised learning, unsupervised learning, etc.) to create the model.
Model Evaluation
Once the model has been trained, it needs to be evaluated to see how well it performs. This typically involves splitting the data into a training set and a test set, and then measuring the model’s performance on the test set using suitable metrics.
Model Optimization
Based on the evaluation, the model may need to be optimized. This can involve tuning hyperparameters, choosing a different model, or going back to the feature engineering stage.
Model Deployment
After the model has been optimized and tested, it’s ready for deployment in a real-world environment. This involves integrating the model into existing systems and processes.
Monitoring and Maintenance
After deployment, the model needs to be monitored to ensure it continues to perform well as new data comes in. This can involve regular retraining and updating of the model.
Model Retirement
Finally, when a model is no longer needed or is outperformed by newer models, it should be retired and its resources reallocated.