Use Case and Evaluation Flashcards
What is a Data Science Use Case (DSUC)?
A scenario or project that creates value uniquely using data-driven insights.
Why is identifying DSUCs important?
It helps organizations increase gain, reduce risk, and decrease effort.
What are the key steps in identifying a DSUC?
Define the problem, collect ideas, structure the ideas, define success, and assess potential risks.
What are operational-related DSUCs?
Use cases focused on optimizing operations, predicting failures, and improving product quality.
What are fraud-related DSUCs?
Use cases detecting unauthorized access, fraudulent behavior, and security threats.
What are customer-related DSUCs?
Use cases focused on improving customer experience, predicting churn, and optimizing marketing strategies.
What are the two types of evaluation for DSUCs?
Model-centric evaluation and business-centric evaluation.
What is model-centric evaluation?
Evaluating the predictive model’s performance using metrics like accuracy, precision, and recall.
What is business-centric evaluation?
Evaluating the impact of a model on business KPIs such as revenue, customer retention, and operational efficiency.
What is the Machine Learning Canvas?
A structured framework used to define, plan, and evaluate machine learning projects.
What are the key components of the Machine Learning Canvas?
Prediction Task, Decisions, Value Proposition, Data Collection, Data Sources, Impact Simulation, Making Predictions, Building Models, Features, and Monitoring.
What is customer churn?
The rate at which customers stop doing business with a company over a certain period.
Why is customer churn important to businesses?
Reducing churn helps retain valuable customers and improves profitability.
What type of machine learning task is customer churn prediction?
A supervised learning binary classification problem.
What features are used in churn prediction models?
Customer demographics, purchase history, subscription details, engagement levels, and payment history.
What data sources are used for churn analysis?
CRM databases, payment records, and website analytics.
How is model performance evaluated in churn prediction?
Using metrics such as accuracy, precision, recall, and F1-score.
What are the key steps in the machine learning workflow?
Feature extraction, data splitting, model training, evaluation, and deployment.
What is overfitting in machine learning?
When a model performs well on training data but poorly on unseen data due to memorization.
How can overfitting be prevented?
Using techniques like regularization, cross-validation, and reducing model complexity.
What is accuracy in model evaluation?
The proportion of correctly classified instances out of all predictions.
What are the limitations of accuracy?
It does not account for class imbalances, which may lead to misleading results in fraud detection.
What is a confusion matrix?
A table used to evaluate classification models by displaying true positives, false positives, true negatives, and false negatives.
What is a Type I error (False Positive)?
Incorrectly classifying a negative instance as positive.
What is a Type II error (False Negative)?
Incorrectly classifying a positive instance as negative.
What is precision in classification?
The proportion of true positives among all predicted positives (TP / (TP + FP)).
What is recall in classification?
The proportion of actual positives that were correctly predicted (TP / (TP + FN)).
What is the F1-score?
The harmonic mean of precision and recall, balancing both metrics.
What are techniques for improving model performance?
Dimensionality reduction, hyperparameter tuning, and ensemble methods.
What is dimensionality reduction?
Reducing the number of features in a dataset to remove redundant or irrelevant information.
What are common dimensionality reduction techniques?
Principal Component Analysis (PCA) and feature selection methods.
What is hyperparameter tuning?
Optimizing the configuration settings of a model to improve performance.
What are ensemble methods?
Techniques that combine multiple models to improve predictive accuracy, such as bagging and boosting.
What is live evaluation in machine learning?
Continuously tracking model performance on real-world data to detect drift and degradation.
What is Return on Investment (ROI) in data science?
The financial benefit gained from implementing a data science solution relative to its cost.
Why is monitoring machine learning models important?
To ensure that model predictions remain accurate and aligned with business objectives.