Midterm Flashcards
Describe ethics vs law
Law: something put forward by a governing body, with the right to enforce it by punishment
Ethics: moral principles that distinguish between ‘right’ and ‘wrong’. Typically no external penalty by violating ethics (assuming you don’t break the law)
Ethics and law may overlap, they may be opposed, or they may be distinct.
Describe the four types of US law
Civil: nation or state level; relationships b/t organizational entities and people; recorded in volumes legal ‘code’
Criminal: violations harmful to society, encored by the state
Private: Relationships between individuals and organizations; family law, commercial, and labor law
Public: regulates structure/administration of government agencies and relationships with citizens, employees, and other governments. Encompasses criminal, administrative, and constitutional law
What is discrimination?
Actions taken on the basis of a bias.
Bias: “I don’t feel good about employing people from Georgia Tech”
Discrimination: “I won’t employ people form Georgia Tech”
Describe Equality of Opportunity vs Equality of Outcome
Opportunity: Decision making treats similar people similarly, based on relevant features
- Two students at Georgia Tech are taking a CS course. They both have equal opportunity to succeed.
Outcome: notion of equality of opportunity that forces decision making to treating seemingly dissimilar people similarly, assuming dissimilarity is based on past injustices.
- Two students are taking CS course and prof says you can use any programming language. [Helps account for differences in background]
What are the four main ethical considerations for data?
- Privacy: Right to control individual information
- Accuracy: Who is responsibly for authenticity, fidelity, and accuracy of information
- Property: Who owns and controls the information?
- Accessibility: What info does org have right to collect, and under what safeguards? What can they do with it after?
What is consequence based ethics?
- Priority givent o choices that lead to a ‘good’ outcome (consequence)
- Outcome weights the method
Considers Utilitarian and Individualism views
What is the Utilitarian View?
The ‘right’ choice delivers the greatest good to the most people
What is the Individualism View?
The ‘right’ choice is the best for long-term self interest
What are Rule based ethics?
Priority given to rules without regard to outcome.
Considers the Moral-Rights view and Rule-based/Justice View
What is the Moral-Rights view?
Right choice is one that respects fundamental rights of all humans (never tell a lie)
What is the Rule-Based/Justice View?
Right choice is impartial, fair and equitable in treatment of people. Exists for benefit of society and should be followed.
Summarize the FB Cambridge Analytica Scandal
- Krogan created an app that was like a personality quiz and 270k people downloaded it. They didn’t tell them they were going to give that data away. Gave it to Cambridge Analytica
- FB api’s let you access info about people and their friends
- This gave them access to upwards of 50M people
- They could look at your likes to try and help determine how to target you, politically
What’s the Financial Services Modernization Act (1999)
Requires notice by financial organizations to customers so they can request their info not be shared with third parties
What is the purpose of the Federal Privacy Act?
Regulates government in protection of individual privacy
What’s the Electronic Communications Privacy Act?
Regulates interception of wire, electronic, and oral communications. In general makes sure the gov requires a subpoena, search warrant, etc.
What’s the Privacy of Customer Information Section of the common carrier regulation?
- Cell phone providers can’t do anything with proprietary information unrelated to providing their service (eg, no marketing)
- Carriers cannot disclose information except when necessary to provide their services
What’s the Privacy of Customer Information Section of the common carrier regulation?
- Cell phone providers can’t do anything with proprietary information unrelated to providing their service (eg, no marketing)
- Carriers cannot disclose information except when necessary to provide their services
What is GDPR?
- Must be opt-in based for data collected and purposes described
- data subject has right to request erasure
- you can transfer your personal data from one system to another
- It is an EU Regulation
- Applies to any org that deals with EU citizen (extraterritorial)
Describe Bias
A predisposition, partiality, prejudice, preference, or predilection.
Generally has to do with stereotypes, prejudice, and discrimination
How might bias become obscured?
Algorithms are black boxes and there are no regulations saying you have to know why the decisions were made. Companies call it a ‘trade secret’ or the ‘secret sauce’ so they don’t have divulge what’s going on.
Where can bias be introduced?
- Data Inputs: incomplete, incorrect, outdated. Poorly selected features, promoting historical bias
- Collection of Data: demographic, geographic, behavioral, temporal biases
- Measurement of Data: What and how do we measure?
- Pre-existing biases in data: gender rolees in text/images, racial stereotypes
What other types of biases exist?
Unintentional:
- Limited and Course Features
- Sample Size Disparity (less data about a minority population)
- Skewed Sample (feedback loops)
- Tained Examples
- Features that act as proxies
What is the connection between bias and privacy?
To see if there was age bias, since you can’t technically use age, you’d have to use some standin feature to try to reconstruct the age.
We don’t want algorithms to be able to reconstruct our profiles.
Not sure I totally get the connection they were drawing.
What are sensitive attributes?
TODO: Look more into this. Not certain it was well explained but they are using it a lot in the Fairness lecture.
Seems like one that could be highly decisive toward the outcome.
Why would you remove the sensitive attribute?
So it doesn’t base it’s decision soley on that component. But that may not be enough if other attributes are highly correlated to the sensitive attribute.
What is group fairness?
Requiring the same percentage in group A and B to receive the same treatment (eg, they get the bank loan).
The question becomes if the outcomes are skewed toward one side (paying back the loan), should they be required to still give the same treatment to the both sides (since they may be taking a loss just to ‘be fair’)
What is individual fairness (consistent)?
Similar individuals experience similar outcomes.
What are two ways to measure individual fairness?
- Risk Difference: p1 - p2
- Risk Ratio/Relative Risk: p1 / p2
Where p1 and p2 are probability of being denied the benefit for each group.
What is biased sampling?
Sampling a small segment of a population.
How can you mislead through poor analysis?
- By presenting different parts of the data graphically
- eg, limiting the y axis so it makes it look like trends are steeper, or more pronounced
- miscalculating trends which leads to misrepresenting the trends.
How can you mislead through interpretation?
modifying graphs to show what you want them to believe. Eg, axis limits, no including relevant information, manipulate the scale.
Describe descriptive vs inferential analytics
Descriptive: methods of organizing, summarizing, and presenting data in an informative way. (frequency table, histogram, mean, variance)
Inferential: methods use to determine something about a population on the basis of a sample. (estimate the average salary of a GT student)
What is cross-sectional data?
Cross Sectional data are collected at the same or approximately the same point in time.
What is cherry picking?
Only choosing to display specific sections of data that support what you want to show.
What is sampling error?
The difference between the sample statistic and the population parameter.
What measures are there for the average?
mean, median, mode, geometric mean, etc
When would you use the median?
When data is not normally distributed. It’s less sensitive to outliers.
When is the mean best?
For symmetric distributions
What is simpsons paradox?
Trend appears in several groups of data, but disappears or reversed when groups are combined.
Occurs when aggregated data hides a conditional variable - some significant factor that influences the results.
What is the mean of the means?
The expected value of the estimator.
What are examples of sampling bias?
- area bias: specific area that doesn’t include representative sample of population
- selection bias: proper randomization not achieved
- self-selection bias: individuals select themselves into a group. Participants decision to participate may be correlated with traits that affect the study.
- Leading Question bias: leaves a clue to the desired answer, leading respondents to tend to agree with the intended direction
- Social Desirability Bias: respondents tend to respond in a manner that will be viewed favorably by others.
What methods can be used to minimize sampling bias?
- Randomized Sampling: choose random samples
- Systematic Sampling: All data is sequentially numbered, choose every nth piece of data
- Statified Random Sample: samples are divided into subgroups, then random sampling applies to each group.
- Cluster Random Sampling: split into clusters. Each clusters are representative of the population. Select on or a few random clusters and do a simple random selection from each cluster.
- Non Probability Sampling: participants are chosen or choose themselves so chance of being selected isn’t known.