CHAP 11 : AI and societal issues Flashcards
What are the 7 rights of individuals of personal data under the GDPR by the European Union (EU) ? (General Data Protection Regulation)
- Right of access
- allow subjects to access the data that the company processes - Right to retification
- right to change or modify the data subjects provide company when they believe the data is inaccurate or out-of-date. - Right to erasure
- right to remove data from database, when subject no longer consent / data is no longer needed etc - Right to restrict processing
- subject’s right to request the restriction of processing, if subject contests the accuracy of processing methods, objects to unlawful processing etc - Right to data portability
- subject’s right to receive the personal data held by the company controlling data in a commonly used format and send the data to another company for use it for their personal purposes - Right to object
- subjects have the right to object to data processing, including profiling, when it is on relevant grounds. - Right not to be subject to a decision based solely on automated processing
What is anonymization?
It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous
How can anonymisation be carried out, in medical data for hospitals?
Why does anonymisation not work too well?
- In patient records, there are fields like Name, Zipcode, Birthday, Sex etc. Anonymisation can be carried out by replacing the name with an anonymous ID
- Anonymisation in this case may not work well as individual’s data may still be recognisable –> e.g. In cambridge, 6 ppl that has same birthday as governer (male), 3 of whom are male, and only 1 lives in the same zipcode as the governer –> when others buy this info, they can identify the governer’s personal details
What is differential privacy?
A system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.
(prevent individual records from being identified by adding noise to data in a controlled way while still allowing for the extraction of valuable insights from the data. )
When is a mechanism considered differentially private?
if the probability of any outcome occurring is nearly the same for any two datasets that differ in only one record.
What are the 2 advantages of differential privacy?
- Robustness during post-processing (protected ag post processing) –> It ensures that predictions do not heabily depend on any datapoint.
- Composability
- A1(D) – guarantees some privacy definition with level e1 for dataset 1
-A2(D) – guarantees sime privacy with level e2 for dataset 2
then releasing both A1(D) and A2(D0 satisfies the same privacy definition with parameter f(e1,e2)
E1 ~ f(e1,e2)
E2 ~ f(e1,e2)
What is the disadvantage of differential privacy?
It gives poorer estimates
Randomised response and peturbation : involves asking individuals to respond to a “yes” or “no” question (for questions where it may be embarassing etc, such as “Do you cheat in exams?)
How can this be done? (using simple coin flipping mechanism)
- Respondents choose answer, “Yes” or “No”
- Before sending the real answer to server, differential privacy algo will flip a coin.
- If heads, sends real answer. If its tails, it flips the coin again. If the second toss lands on heads, send the real answer. If its tails, it sends the opposite answer.
Differential privacy : coin flipping differential privacy algo
Suppose fraction x of students cheated in exams, and the measured fraction of “Yes” is y, how can we estimate the actual fraction of students who cheated?
y ~ 0.5x + 0.25
Differential privacy : coin flipping differential privacy algo
Suppose fraction x of students cheated in exams, and the measured fraction of “Yes” is y, how can we estimate the actual fraction of students who cheated?
y ~= 0.5x + 0.25
[ 0.5x = P(heads on first toss) AND P (student actually cheated),
0.25 = P(first toss tail) AND P (student randomly answered yes) ]
x ~= 2(y-0.25)
Which 2 companies uses differential privacy to collect usage statistics to provide privacy to users?
- Apple (iphone)
Amazon created a tool to review resumes to hire top talent. However, its AI tool was shown to be biased and discriminated against women.
How is it possible that this tool was biased?
- The tool’s algortihm predicts if an application is successful by looking at resume based on PAST applicants
- The outcome on past applicants may have been biased, as more males were hired in the past.
- This leads to a biased algorithm in order to predict well
What does statistical parity measure ?
Statistical Parity measures the difference in probabilities of a positive outcome across two groups.
What does equality of of opportunity mean?
It means that the same proportion of each population/group receives the “good” outcome.
If there is equal opportunity, statistical parity is achieved. However, it may not be fair in real life. Think of some examples where this may be the case?
- Employment discrimination: An employer may have an equal number of male and female employees, but if women are disproportionately assigned lower-paying or less desirable positions or denied promotions or training opportunities, there is still unfairness and discrimination.
- Loan approvals: A lender may approve loans at equal rates for different racial groups, but if the criteria for approval systematically disadvantage certain groups, such as requiring higher credit scores or more collateral, it can still be unfair.
- Access to healthcare: Equal access to healthcare may be achieved on paper, but if certain groups face more barriers to obtaining healthcare, such as lack of transportation, inadequate insurance coverage, or discrimination from healthcare providers, it can still be unfair.
- Educational opportunities: Statistical parity in enrollment or graduation rates may exist between different groups, but if certain groups are systematically disadvantaged in terms of access to quality education, such as through underfunding or inadequate resources, it can still be unfair.