Lecture 10 Flashcards
The coded gaze: algorithmic bias:
creating exclusionary experiences and discriminatory practices-> and spread
Video:
white persons face is recognized but not black persons face
Aspire mirror: only with white mask-> issue
Bias travels: social robot count detect her face
Peek a boo also couldn’t detect her face
Algorithmic bias
Need facial recognition-> can teach computer to recognize face-> but if isn’t diverse in teaching then computer can’t recognize face
Algorithmic bias discriminatory bias: police fighting crime-> 1 in 2 have their faces in facial recognition networks-> unregulated using algorithms-> misidentifying criminals and breaching liberties
video 2:
Beyond computer vision-> WMDs used to make decisions that impact more aspects of our lives
Law enforcement uses this for predictive policing-> how long person stays in prison-> are they fair - Algorithmic bias doesn’t always lead to fairness-> need to change the code
- Inclusive coding:
Who code matters
How we code matters Why we code matters Make a change: identify bias by collecting experiences and auditing software-> more inclusive training sets -> think more consciously social impact
Identifying bias
Curating inclusively
Developing conscientiously
Incoding movement: algorithmic justice league
Technology for all of us not some of us
Weapons of Math Destruction,
Math is truth-> poor in other fields
Disillusioned about the triple A ratings on mortgages before the financial crisis based on math-> trust in math abused
Data scientist: predict people
Food: definition of food, and success for dinner (in her case eating veggies, but her son eating Nutella)-> becomes a model-> algorithm to make a meal set for success-> impose agenda on algorithm, happens always
Person who makes algorithm is poses his version of success on it which is different for different people but invisible behind mathematical data
Humans make algorithms into decisions
If peoples life options are determined by algorithms then they are important
Widespread Mysterious Destructive: WMD
Mysterious, don’t know they are being scored-
not okay for them for them to be secret because part of laws - Ruin people’s lives unfair - Get rid of bad teachers
Poor kids do less good on standardized tests-> when judging teachers who teach these kids then that is unfair
Value-added teacher model: primary model with expected score and then actual score and teacher held accountable for difference with these scores-> small statistics-> bad model, a lot of uncertainty What do they actually got for the test determined but other factors as well-> uncertainty Error term-> very noisy
Source code propriety-> people didn’t know what they are being fired from-> nobody knows the code
Person found people with two scores-> 24% correlation-> all over the place-> not good enough to hold people accountable
Eraser rates-> previous teachers cheated on the tests (Sarah) -> punished for previous teacher cheating Urban school districts-> failure
Widespread, secret and destructive at individual and systemic level
Personality tests
Kyle-> test-> failed it-> red light by algorithm-> father is a lawyer-> five factor model: illegal-> metal health/ health exam-> different grocery stores all failed-> father suing them all
Other places do it too-> destructive, widespread-> systematically denying employment -> not just minimum wage jobs
Fox News: Ailes fired for sexually harassing women and keeping them from promotion-> algorithm: look like someone in the past, would filter out women (not get job or systematically kept form promotion)-> algorithms are not objective or fair, repeats past
If we have perfect way then we want to code it but until now coding past practices - Criminal justice
Predictive policing:
over-policing, uneven policing in poor black communities-> get arrested a lot more than white people-> up to 10 times more often: biased against black and depends on local conditions of how the police is supposed to act
Nonviolent crime much more predictable-> in poor sections-> goes into arrest records-> algorithm put police where we saw crime in the past: poor places
Violent crime: hard to predict, easy to predict are poverty-> feedback loop where you have pseudoscientific basis for sending police back to over-policed areas
Recidivism risk: score (risk of coming back to jail) given to judges-> arrest records (biased) and questionnaire (LSIR) -> higher risk, longer in jail-> the questionnaire is unfavourable to people from poor and black people-> proximity for race
Family a criminal: unconstitutional -> however ‘math’ makes it okay
Create own reality-> longer in prison, don’t benefit from it, no resources, no connections, no wealth, felony and end up back in prison because of high risk score
We don’t know yet what save algorithms are -> ethical responsibility - Right to scrutinize score
How to see limitations-> cannot answer question until you know what it’s for-> can have positive or negative effect
Scale of problem-> vulnerable people targeted and don’t have lawyers to protect them
Can build an algorithm anonymous but still target people, not about anonymity
Algorithms in law are bad but if good then better than judges, not racist -> why do we make people go to jail longer even if higher risk
Even though family makes more likely to become a criminal, doesn’t make it fair even though it ‘prevents’ crime-> inconsistently defined association
Success if not well defined then unfairness-> optimizing away from success conditions? Parable: the more data the better, not true, creeps into our brain, excess information we should ignore -> build the model for success that if fair, or audit it to fairness: check biases -> is complicated
Big data not a silver bullet, just a tool but won’t solve all our problems
Need control data otherwise just a number but doesn’t say what is means
Limit to what open sourcing can do-> need auditing, fairness auditing
Sacrifice accuracy for fairness-> our concept of what is fair as a technologist is not enough -> if its interpretable than easier for other people to say its not fair -> restricts decision trees - Collect data for experiments as an audit
Article: Diversity in big data
Economic inequality can be translated into technology
Diversity: ensuring that different kinds of objects are represented in the output of an algorithmic process-> collection of items
Index of diversity: degree of concentration of diversity achieved when the individuals of a population are classified into groups
Ethical reasons and utilitarian (powerful, accurate and engaging)
Lack of diversity in algorithms
Hiring: statistical models used for job candidates out of pool of resume
Crowdsourcing: each has own opinion and private information for own interpretation-> problem-solving tasks
Matchmaking: top rankings in dating apps don’t represent diversity or preferences
Search and content recommendation: bubble effects-> polarize public opinion, increase ideological segregation and undermine democratic process
survey
Scope: diversity is composition of S with regards to variety of constitutes-> S here is can be set of people, items etc. structured, complex or unstructured
Focus: selection task-> bundle of elements that meet quality and relevance criteria and diverse
Models and algorithms that enforce diversity in output of algorithmic task
Diversity based and novelty based ranking of text documents or Web searches in IR-> resolving query ambiguity, novelty and serendipity also user satisfaction
Diversity socio-technical concept-> many interpretations: context
Diversity in hiring-> need to see how people are exposed to and interact with online information
People connected to like-minded people: information bubbles or echo chambers -> however still share information with people across ideological lines-> open to alternative views-> individual choices limit exposure to attitude-changing content in Facebook-> need study on how this works
Only 10% of friends remain in contact with-> searching on Facebook for information more likely into information bubbles than search engines
Low collective diversity doesn’t mean low individual diversity-> information sharing and consumption on social media gives rise to collective and individual bubbles
Audience diversity makes more likely hashtag reaches more people-> but specific topic gains more followers> highly popular users don’t contribute to diversity
Models: formal models of diversity -> diversification problem for the selection task: we want diversity to be a set of elements out of a set that contains a set of elements from the larger set (??)
Utility and ranking: besides diversity also user’s needs-> recommendations system needs also be accurate: for instance by a minimum utility or unified measure
Aggregate diversity: diversity of item recommendations to user and diversity of recommended items to all
Distance-based measures: pairwise distance measure-> need certain distance between items to be diverse> is flexible-> the more diverse explanation based on which the recommendation of an item is made, the more diverse the item
Coverage-based measures: rely on predefined number of aspects (topic, interpretations or opinions)-> probabilistic model -> (a) what are the aspects being covered (b) how is the coverage measured
Hybrid distance-based and coverage-based measures: can be combined or add up-
Novelty based measures: diversity with respect to elements seen in the past-> reduce redundancy
Diversity must be enforced through individual independent choices rather than constraint of set of final results-> incremental maintenance of diversity properties of a result: reasoning about stability of the selected set in response to incremental changes in the input
Chapter: Introduction, C. O’Neil
Math combined with technology to multiply chaos and misfortune-> efficiency and scale to systems that are flawed
Crisis 2008-> yet moved further with it: studying people ‘fair and objective’-> however based on choices made by humans -> encoded in the models: prejudice, misunderstanding and bias -> increase inequality -> weapons of math destructions (WMDs)
Example: 2007 new mayor Washington D.C. wanted to turn around underperforming schools-> evaluate teachers-> teacher assessment tool -> didn’t take in positive reviews so good teachers were also fired because preformed bad in algorithms-> reviews could make ‘bad’ teachers look ‘good’ so firing based on the ‘hard’ data -> how was this model made if it also fired the good ones-> complex because take into account context-> but how much in the gap in test results of kids is because of the teacher?-> impact of a person on a school year hard to calculate-> analysing based on thirty kids is statistically unsound-> need feedback when off -> instead of searching for truth of failing schools the model becomes the truth -> more likely to cheat so they don’t get fired for ‘bad’ teacher
Many WMDs define own reality and use it to justify their results-> self-perpetuating and destructive and common
WMDs have poisonous assumptions camouflaged by math and go untested and unquestioned -> punish the poor-> elite processed by people and masses processed by machines
Model itself is a black box-> if kept into dark then more likely to accept results -> they don’t bend, even when being cheated
Paradox: algorithm processes a slew of statistics and come up with a probability that a certain person might be a bad hire, risky borrower or terrorist-> probability is distilled into the score which can turn a persons life around-> yet when fighting back the ‘suggestive’ countervailing evidence wont cut it-> case is ironclad-> victims higher standards than algorithms themselves
Models in business give money, as long as they give money they are the truth-> don’t dwell on the numbers as long as it gives them money the people running the WMDs -> victims are the people-> ignore imperfections
Baseball
historical data to analyse current situation and calculate positioning that is associated with highest probability of success -> mathematical models -> are fair, transparent, statistical rigor (immense data set a hand and relevant) don’t use proxies-> new data coming in all the time
Model: abstract representation of some process-> predicts responses to different situations -> are also in our heads -> updates and adjustments makes it a dynamic model -> explaining informal models to others makes them mathematical -> best way to put the data in over time -> there will be mistakes because of complexity of nuance of human communication -> can create blind spots which sometimes don’t matter
Models are opinion embedded in mathematics -> definition of success
Good models can be primitive some only one variable-> problems when focus on humans: racism or sexism can cause people to be calculated away
Duane Buck: which sentence for killing a person-> sentenced to death because Black people more likely to
kill