General Machine Learning & Ethics Flashcards
What is the difference between supervised and unsupervised learning?
Supervised learning involves training a model on labeled data, while unsupervised learning deals with unlabeled data and seeks to find patterns or structures within it.
What is Machine Learning and what does it do?
Machine Learning is the development of algorithms and statistical models that enable computers to learn from data and make predictions or decisions without explicit programming
Supervised ML
- uses labeled datasets
- to train algorithms to
- classify or predict outcomes.
What is required for Supervised ML?
you need labeled data
Unsupervised ML
uses algorithms to analyze and cluster unlabeled datasets.
Once an algorithm is deployed, ____ learning will manage data as it comes in and classify or analyze it.
Unsupervised
When is Linear Regression used?
Linear regression models are used when the result must be a continuous variable
Ex. predict rainfall amounts in inches
Supervise ML - Classification
Classification models will deliver results as a categorical variable, where there is a finite set of values that the variable can be. two results: Will Rain or Won’t Rain.
What is the 3 main goal of the analyze stage in machine learning?
- Understanding response variables and how they’re structured. continuous? categorical?
- Explore predictor variables.
- Featuring Engineering
Will my machine learning model change over time?
For a model to predict accurately, the data that it is making predictions on must have a similar distribution as the data on which the model was trained.
Because data distributions can be expected to drift over time, deploying a model is not a one-time exercise but rather a continuous process.
Continuous monitoring of incoming data can help retrain your model on newer data if the data distribution has deviated significantly from the original training data distribution.
How can I determine that machine learning is the right solution?
- Requires complex logic
- Requires scalability
- Requires personalization
- Requires responsiveness
What are the reasons to NOT use machine learning?
- Can be solved with traditional algorithms
- Does not require adapting to new data
- Requires 100% accuracy
- Requires full interpretability
Is my data ready for a machine learning solution?
- Is it easily accessible?
- Does it respect privacy?
- Is it relevant?
What is popularity bias in the context of machine learning?
Popularity bias refers to the phenomenon where more popular items are recommended more frequently by a system, often overlooking other items that could be just as pleasing to users.
Why is it important for data professionals to prioritize fairness in their data
- reduce the potential for unintended consequences of machine learning applications, including the perpetuation of human biases.
- It is part of responsible data stewardship.
What are some ethical considerations when building a model
- ensuring informed consent for the use of personal data
- considering who is affected by the model and potential harm
- ensuring data is appropriate and representative
- considering the explainability of the model’s predictions
- and regularly reviewing and monitoring the model’s performance.
What is a black box model?
A black box model refers to a type of model where it’s difficult to understand how the model arrived at its predictions.
What is one way to evaluate model fairness?
One way to evaluate model fairness is by checking how the model’s error is distributed over a population. If the model mainly makes errors in specific, similar cases it could carry higher ethical risk.
What is a potential risk in decision-making in machine learning?
A potential risk is exposing a business and the people it serves to negative consequences
Where does bias in machine learning originate from?
from human bias
Why can bias in machine learning be deceptive?
It can be deceptive because even though the bias stems from humans, the computer making the prediction can give the result an appearance of objectivity.
What are some questions to help consider faireness of your model?
- If your model uses personal information, have these people given their consent for you to collect and use this data?
- Is there a way for them to withdraw their consent?
- Are they aware of what you’re doing with their information?
Explain the bias-variance trade-off in machine learning.
the trade-off between a model’s ability to fit the training data (low bias) and its ability to generalize to new, unseen data (low variance).
- High bias can result in underfitting,
- High variance can lead to overfitting
Balancing these aspects is essential for optimal model performance.