Week 9: Time Series, Imbalanced Data & Fairness Flashcards

Question

What are 5 options for handling imbalanced data?

Answer 1

- Collect more data - difficult in many domains - Delete data from the majority class - Create synthetic data - Adapt your learning algorithm (cost sensitive classification) - Random over/under sampling

Answer 2

Maintain a count of the number of distinct elements seen so far

Answer 3

The long term growth or decline of the series

Answer 4

Generative adversarial networks (GANs) -System of two neural networks (generator and discriminator) competing against each other in a zero-sum framework: improvement in one model come at cost to performance of other model Can learn to draw samples from a model that is similar to the original data

Answer 5

EO states that instances from protected and unprotected groups should have equal true positive rate (TPR) and false positive rate (FPR) - P1 = P[Y\*(x) = 1 | S(x) = G’, Y(x) = 1] - P2 = P[Y\*(x) = 1 | S(x) = G, Y(x) = 1] - P3 = P[Y\*(x) = 1 | S(x) = G’, Y(x) = 0] - P4 = P[Y\*(x) = 1 | S(x) = G, Y(x) = 0] - For a classifier to be fair: P1=P2 and P3=P4

Answer 6

Naive solution: generate a random integer in [0..9] for each query. store query if the integer is 0, otherwise discard Problem: as the stream grows. the sample size will also grow

Answer 7

1. Determine the number of samples n 2. Allocate mid point in time and replace the time points by their corresponding x values by increasing and decreasing one unit from the mid point accordingly 3. The dependent variable is “y" 4. Compute sum(xi^2) and sum(xi\*yi), where sum(xi) is 0 5. Find y = a+bx where b = sum(xi\*yi)/sum(xi^2) and a = sum(yi)/n

Answer 8

- Semi-average - Moving average - Least-square - Exponential smoothing

Answer 9

- root mean squared error - mean absolute error (MAE) - tracking signal = sum(yt - y\*t)/MAE

Answer 10

-Also called rolling window Next periods forecast = simple average of the last k periods Yt+1 = (Yt-k+1 + Yt-k+2 + … + Yt) / k

Answer 11

An individual fairness metric measures how similar the labels are for the similar instances in a dataset based on the k-neighbours of the instance -Takes values between 0 and 1 where 1 is the optimal

Answer 12

- Based on the premise that if values in a time series are averaged over a sufficient period, the effect of short term variations will be reduced - The degree of smoothing can be controlled by selecting the number of cases to be included in the average - a 5-year moving average: for one year, get the average of the 2 previous years, current year and two ahead years. this is the average for that year. compute for each year and plot.

Answer 13

high degree of irregularity in original or seasonal-adjusted series or, abrupt change in the time series characteristics of the original data

Answer 14

D = {X,S,Y} is a dataset \* X: the set of attributes that do not contain sensitive information regarding individuals \* S: the set of sensitive attributes containing sensitive information \* Y/Y\*: either 0 or 1 is the original/predicted class label of individuals, which indicates the decision outcome \* G/G’: the values of the unprivileged/privileged group

Answer 15

series which are measures of activities to specific dates e.g. retail, balance of payments

Answer 16

Next periods forecast = weighted average of the last k periods with Yt+1 = c1Yt-k+1 + … + ckYt with c1+c2 … + ck = 1

Answer 17

might introduce artificial minority class examples too deeply in the majority class space

Answer 18

Random under-sampling: randomly delete data points from the majority class - problem with loss of information

Answer 19

- Store all the first s elements of the stream to S - We have seen n-1 elements, now the nth element arrives - With probability s/n, keep the nth element, otherwise discard it - If we picked the nth element, then it replaces one of the element s in sample S, picked uniformly at random

Answer 20

Classifiers try to reduce the overall error (increase the accuracy) so they can be biased towards the majority class

Answer 21

bias in the training datasets

Answer 22

- Trend - Seasonal variation - Cyclical variation - Irregular variation

Answer 23

- Sensitive attributes should not affect the outcome labels - Identify “proxy” attributes that are related to the protected attributes

Answer 24

F1 = (2\*R\*P)/(R+P)

Answer 25

- Huge columns of continuous data, possible infinite - Fast changing and required fast, real-time response - Random access is expensive - single scan algorithms

Answer 26

estimate the counts in an unbiased way. Accept that the count may have a little error, but limit the probability that the error is large

Answer 27

Balanced accuracy = (sensitivity + specificity)/2

Answer 28

- Synthetic minority over-sampling techniques (SMOTE) - Creates new data points from the minority class

Answer 29

Random oversampling: randomly duplicate data points from the minority class - problem with overfitting and fixed boundaries

Answer 30

PP and EO need original and model DP, DI and consistency can be computed from either the original or the model

Answer 31

MSE = sum(yt = y\*t)^2 / T - T1 + 1 T=total number of samples in time series T1 = index of first value to be forecasted yt = actual value y\*t = predicted value

Answer 32

- Simple average method - Take the average for each period (period mean) over at least 3 years - Express each value as an index by comparing it to the average of all periods over the same period of time (divide actual value by period mean to get index)

Answer 33

1. Take the difference between a sample point and one of its nearest neighbours 2. Multiply the difference by a random number between 0 and 1 and add it to the feature vector

Answer 34

- Networks are difficult to converge - The goal is for generator and discriminator to reach some desired equilibrium but this is rare - GANs are yet to converge on large problems

Answer 35

-The ratio between the probability of protected and unprotected groups getting positive or desired outcomes DI(D) = P[Y(x) = 1 | S(x) = G] / P[Y(x) = 1 | S(x) = G’] -A dataset or a classifier is considered fair (by law) if its DI-ratio is between 0.8 and 1.25 (1 is the optimal)

Answer 36

AEO(diff) = [(P1-P2) + (P3-P4)] /2

Answer 37

- Cost is the penalty associated with an incorrect prediction, goal is to minimise the cost - Based on the classifier predicted probabilities - Binary traditional case: predict positive if probability is \> 0.5 - Probability threshold can be changed using a cost matrix - Classify as positive if: probability of positive \> FP/FP+FN

Answer 38

-The instances in both protected(unprivileged) and unprotected(privileged) groups should have equal probability of being predicted as positive outcome DP(diff) = P[Y(x) = 1 | S(x) = G’] - P[Y(x) = 1 | S(x) = G] = approx 0 -This metric takes values between 0 and 1 where 0 is the optimal

Answer 39

Suffers from propagation error

Answer 40

Pre-process the dataset only - try to transform the data so the underlying discrimination is removed

Answer 41

- A smaller k makes the forecast more responsive - A larger k makes the forecast more stable

Answer 42

majority of the data coming from one class

Answer 43

- Individuals with similar features except the sensitive (protected) attributes must have the same/similar outcomes - A similarity/distance measure is needed - Requires strong assumptions regarding the relationship between features and the decision label

Answer 44

\* A classifier is fair in terms of predictive parity if the probability that an example is positive in the original dataset given that it is predicted positive from both protected and unprotected groups is the same \* P[Y(x) = 1 | Y\*(x) = 1, S(x) = G] = P[Y(x) = 1 | Y\*(x) = 1, S(x) = G’]

Answer 45

- Fairness through unawareness: deletes the sensitive attributes in a dataset - Preferential sampling (re-sampling): data objects are sampled with replacement - Massaging (relabeling): changes the actual class labels of some of the instances in the training set - Reweighing: assigns weights to each instance in the training set

Answer 46

Fb = (1+B^2)(R\*P)/(B^2\*P + R)

Answer 47

- Define multiple subgroups in a dataset, check parity between these subgroups - A statistical constraint is needed

Answer 48

- Adjust/tune the classification algorithm - Applied during the model training

Answer 49

- An irregular (or random) variation in a time series occurs over varying (usually short) periods - It follows no pattern and is by nature unpredictable - Irregular variation cannot be explained mathematically

Answer 50

regularly spaced peaks and troughs

Answer 51

- Past history is used to flatten out short term fluctuations Sx = ay + (1-a)Sx-1 - Sx = the smoothed value for observation x - y = the actual observation at time x - Sx-1 = the smoothed value previously calculated for observation at time x-1 - a = the smoothing constant where 0 \<= a \<= 1

Answer 52

a pattern of change that recurs regularly over time

Answer 53

Next periods forecast = previous period’s actual Yt+1 = Yt