FACT Flashcards
What is FACT? (generic)
Fairness, accountability, confidentiality and transparency
What is F in FACT?
Fairness
- Examples:
Amazon hiring model
Biased —> disfavouring women
US COMPAS score for likelihood of recidivism for criminals
Black people were more likely to be labelled as high risk
White people were mislabelled as low risk
More Issues:
Correlated proxy variables cannot always be identified as such
Some subgroups may be unexpectedly discriminated
- Solutions: Leave data about protected categories out Check for common biases before deploying Create fair training sets Create divers data science teams Discuss difficult questions There is no bias-free learning We call ‘neutral’ is always a cultural choice No silver bullet to solve the issue
What is A in FACT?
Accountability
- Problem is the responsibility gap
- For a person to be held responsible he must have control over her behaviour and the resulting consequences
- If machine malfunctions, the manufacture is responsible
- Manufacturers/operators of ML models are not always in control of the outcome, so are not morally responsible in the same sense
- New concepts of responsibility are needed to be able to have proper accountability
What is C in FACT?
Confidentiality of information, the privacy of people both enabled by security
What is T in FACT?
Transparency
- How to clarify automated decisions such that they become indisputable?
- How to ensure that transparency will not become an overflow of information?
- How to explain such a complex mathematical system?
What is K-anonymization?
a mathematical model which ensures k entry points are indistinguishable from each other
What is a linkage attack?
- Can be done by making links in data which can lead to a person who was anonymized
- How to prevent:
1. Store and process data locally;
2. Anonymization when presenting results;
3. Data minimisation when collecting and when deleting data;
4. Make sure to overwrite the disk.
What is differential privacy?
- Mathematical definition of privacy
- Methods:
1. Adds noise to the individual entries in a way that can be still analysed;
2. Adds noises that can be accounted in for statistics;
allows for research with private enough data - Several existing algorithms;
- Trade-off between privacy and accuracy
- Technique meant for large data sets;
The more queries the more results coverage and the more privacy loss;
Privacy budgets limit the numbers of queries.
What is Privacy?
Privacy:
- Anonymization is an intuitive technique (de-identification):
- Means to remove the identifiers marks from data
- De-identifying data does not mean that services are protecting your personal details, only means that they remove “your” from the details when sharing
- Not really anonymous