9 - Ethics Flashcards by Kaman Hung

What was the main topic of the calendar invitation Kamala received?

Data Ethics

The invitation was titled “Urgent: Data Ethics” following a competitor’s discriminatory actuarial model.

How well did you know this?

Not at all

Perfectly

What incident triggered the urgent meeting about data ethics?

A competitor implemented a discriminatory actuarial model leading to higher premium increases for Black patients

This incident was discovered when a hacker accessed the competitor’s data systems.

How well did you know this?

Not at all

Perfectly

What key considerations are necessary for ensuring data integrity?

Data must be collected, stored, and used with integrity to minimize harm.

How well did you know this?

Not at all

Perfectly

What did the CEO emphasize regarding the use of data?

No use of data that exacerbates or creates undue harm to patients will be condoned.

How well did you know this?

Not at all

Perfectly

What does the data science code of ethics relate to?

It includes principles for data scientists to follow to prevent misuse of data.

How well did you know this?

Not at all

Perfectly

What are the four key concepts in data ethics mentioned?

Fairness
Privacy and security
Transparency and reproducibility
Social impact of data

How well did you know this?

Not at all

Perfectly

What is a misconception about machine learning models and bias?

That using data to make decisions eliminates human bias.

How well did you know this?

Not at all

Perfectly

How can bias in machine learning models occur?

If the underlying data or decisions made when collecting data were biased.

How well did you know this?

Not at all

Perfectly

What famous example illustrated the consequences of training data bias?

Google Photos algorithm mislabeling images of people with dark skin as gorillas.

How well did you know this?

Not at all

Perfectly

What is one measure of fairness in machine learning models?

Group fairness, which measures whether subjects in each group have equal probabilities of being assigned to a certain outcome class.

How well did you know this?

Not at all

Perfectly

What is another measure of model fairness?

Accuracy comparability across groups of interest.

How well did you know this?

Not at all

Perfectly

What can be done if bias is found in a model’s performance?

Identify biased data sources, remove them, and retrain the model.

How well did you know this?

Not at all

Perfectly

What is adversarial debiasing?

A method to address model bias by incorporating fairness into model development and training.

How well did you know this?

Not at all

Perfectly

Why is there a debate about including race in prediction models?

Including race may recapitulate existing racial biases and inequities.

How well did you know this?

Not at all

Perfectly

What are some ethical principles proposed for data scientists?

Respect privacy of data subjects
Acknowledge limitations of one’s knowledge
Recognize data represents real people and situations
Avoid causing societal harm

How well did you know this?

Not at all

Perfectly

What does the data science oath emphasize regarding data subjects?

Respect for their privacy and security.

How well did you know this?

Not at all

Perfectly

What should be ensured before model deployment?

Conduct fairness analysis to check for biases.

How well did you know this?

Not at all

Perfectly

What is the potential risk of using a biased model in healthcare?

It may exacerbate existing inequities in health outcomes.

How well did you know this?

Not at all

Perfectly

What approach can be taken to ensure equitable model performance?

Measure model performance across different demographic groups.

How well did you know this?

Not at all

Perfectly

What is the significance of testing for fairness in data models?

To ensure that no group is unfairly disadvantaged by the model’s predictions.

How well did you know this?

Not at all

Perfectly

What did Kamala compare the machine learning bias issue to?

The lack of diverse images in medical textbooks affecting doctor training.

How well did you know this?

Not at all

Perfectly

What is the outcome of deploying a model with known bias?

It can lead to harmful consequences for affected populations.

How well did you know this?

Not at all

Perfectly

What is the debate regarding the inclusion of race in prediction models?

The debate centers on whether including race perpetuates racial biases or if it is a strong predictor of outcomes.

How well did you know this?

Not at all

Perfectly

What are some best practices for dealing with race in predictive modeling?

Check model performance across racial groups
Assess if race is a proxy variable
Ensure data quality for the race variable

How well did you know this?

Not at all

Perfectly

What is an example of low-quality racial data in healthcare?

Grouping diverse racial groups, such as all Asians, into a single category, which masks important health differences.

What industries may have legal restrictions on modeling sensitive variables?

* Credit scoring * Actuarial modeling

What is data drift?

Data drift refers to changes in the underlying distribution of data over time.

What is concept drift?

Concept drift is a change in the outcome variable or the relationship between outcomes and predictors over time.

What is feature drift?

Feature drift occurs when the distribution of predictor variables changes over time.

What is the first step in dealing with data drift?

Recognizing that data drift is happening.

How can you monitor data drift?

By comparing new input data with original training data on a recurring basis.

What is the ethical imperative related to data drift?

To recognize and address any decline in model performance to prevent patient harm.

What is HIPAA?

The Health Information Portability and Accountability Act, which governs the handling of protected health information.

What is a need-to-know basis in data handling?

Accessing only the minimum information necessary for an individual's role.

What is the importance of informed consent in data collection?

It is critical for ensuring ethical and legal compliance, especially in healthcare.

What does reproducibility in science mean?

That two scientists should be able to design independent experiments and achieve concordant results.

What is the difference between repeatability and replicability?

* Repeatability: Same scientist obtains nearly identical results * Replicability: Different scientists achieve the same results using the same methodology

What can a lack of reproducibility indicate?

* Genuine scientific uncertainty * Human or measurement error * Potential misconduct or bad science

What is an example of a retracted study due to misconduct?

The study linking the MMR vaccine to autism, which was found to have fabricated data.

Why can reproducibility be challenging in medicine?

Because experiments are complex and have many variables, making standardization difficult.

What role does sharing data and code play in reproducibility?

It allows for error checking, cross-verification of methodology, and enhances overall transparency.

What tools facilitate the sharing of code in data science?

Github and similar collaborative platforms.

What is the significance of sharing code and data in research?

It allows for cross-checking methodologies and improving reproducibility.

What is GitHub used for in the context of code sharing?

It allows people to collaboratively write, review, and share code.

What is the Harvard Dataverse?

A repository where researchers can store and share their data.

What are some reasons for not sharing code or data?

Intellectual property and privacy concerns.

What challenges do researchers face when replicating experiments from written descriptions?

Key steps and details may be omitted, leading to different outcomes.

True or False: Documenting analysis methods is unnecessary for good science.

False.

How can inadequate documentation affect team resources?

It can waste resources as team members struggle to understand past analyses.

What is a 'deep fake'?

A computer-generated video that looks extremely realistic.

What concerns do AI developers have regarding their technology?

The potential for the technology to be used to harm others.

What is an example of a proactive approach to monitor potential harms in model development?

Setting up systems to monitor adverse events.

What is cherry-picking in data analysis?

Manipulating data to support a preconceived conclusion.

What should be done to prevent data manipulation in analysis?

Prespecify the research question and analysis plan.

What can happen if different analysts interpret the same data set?

They may arrive at different conclusions.

What is the consequence of changing the analysis approach to fit initial hypotheses?

It can lead to biased and unreliable results.

What is the role of the data science team in maintaining analysis integrity?

Ensure objectivity and document the analysis plan.

What is the importance of ethical data stewardship?

It helps prevent bias, misinterpretation, and breaches in security.

Fill in the blank: A system for ethical data analysis and interpretation should be adopted by all _______.

stakeholders.

What key question addresses the potential malicious use of a project?

Could this project be used in malicious ways?

What should be monitored to ensure the model performs fairly?

Changes in data and model performance over time.

What variables should be avoided to prevent bias in model output?

Variables that may bias the output.

Which laws and regulations might be applicable to our project?

Relevant laws and regulations.

How do we ensure future reproducibility of our analysis?

By saving the code and data.

What does it mean to prespecify an analysis approach?

To define the analysis plan before conducting the analysis.

True or False: Ethical data analysis includes documenting results interpretation.

True.