Session 7 Flashcards

1
Q

Sensitive Characteristics (or protected attributes)

A

are those that cannot be used (legally) to differentiate individuals with respect to the target variable in predictive models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

By defining some characteristics as “sensitive”, we are assuming that the algorithms can end up differentiating individuals based on these characteristics

A
  • This may be because the data reveals existing injustices (e.g., some groups of individuals may be already discriminated, and that shows in the data)
  • It can be because of differences in tastes and behaviors, and may not represent discrimination (e.g., Sinterklaas is more popular in The Netherlands than in Portugal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Formal non-discrimination criteria

A

Many fairness criteria have been proposed over the years, each aiming to formalize different requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Most proposed fairness criteria are properties of the joint distribution of:

A

A - the sensitive attribute
Y - the target variable
R - the classifier or score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Most criteria fall into one of three categories regarding how these variables are related with each other:

A
  1. Independence (R ⊥ A)
  2. Separation (R ⊥ A | Y)
  3. Sufficiency (Y ⊥ A | R)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Independence

A

has been explored through many equivalent terms or variants, referred to as demographic parity, statistical parity, group fairness

Main idea: “Everybody gets treated the same”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A classifier R is independent from an attribute A if

A

the probability of the classifier predicting an observation to be positive (R = 1) does not change with a change in the attribute A:

Pr(R = 1 | A = a) = Pr(R = 1 | A = b)

Example: The probability that a person is predicted to default on their loan does not depend on their race

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Seperation

R ⊥ A | Y

A

requires the score (R) to be independent from the sensitive attribute (A) given the outcome (Y). In other words, it allows correlation between the score and the sensitive attribute to the extent that it is justified by the target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Seperation (R ⊥ A | Y)

Main idea:

A

Given an outcome (e.g., defaulting on a loan), the percentage of individuals predicted positive (and negative) are similar across groups of a sensitive attribute (e.g., black, white)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sufficiency (Y ⊥ A | R)

A

requires the outcome (Y) to be independent from the sensitive attribute (A) given the score (R). In other words, it allows correlation between the outcome and the sensitive attribute to the extent that it is justified by the score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sufficiency (Y ⊥ A | R)

Main idea:

A

Given a prediction, the percentage of those that are positive is similar across groups of a sensitive attribute (e.g., black, white)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PPV / NPV

A

Positive Predicted Value

Negative Predicted Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relationships between criteria

In general

A

Each of these fairness criteria is incompatible with the other two. You can satisfy only one of them at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is Google able to find cats in my photos?

Approach #1: Predictive modeling

A
  1. Define a target variable
    - > cat vs no cat
  2. Gather a large set of photos
    - > label the photos
  3. Create a set of features (or predictors)
    - > 2 eyes, pointy ears, spots
  4. Run a tree induction model
  5. Use the model to classify my photos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deep learning

A

is a new area of machine learning that uses artificial neural networks for unsupervised pattern recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Deep Learning is being used across different fields:

A
  • Object Recognition (cats and self driving cars)

- Speech Recognition (Google Assistant and Siri) - Drug discovery

17
Q

The perceptron

A
  • Is an algorithm for supervised learning of binary classifiers
  • Similar to logistic regression
  • Can be used for online learning, i.e., it can adjust to new observations
18
Q

Neural networks can…

A

approximate any function

19
Q

Weights are updated using an algorithm called

A

backpropagation