Week 9: Time Series, Imbalanced Data & Fairness Flashcards

1
Q

What is a stock series?

A

measures of activity at a point in time e.g. employment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sampling a fixed sample size?

A
  • Maintain a sample size S of exactly s samples
  • Suppose at time n we have n items
  • Each sample is in the sample S with equal probability s/n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are 5 methods for forecasting the trend?

A
  • Naive forecasting
  • Simple mean
  • Moving average
  • Weighted moving average
  • Exponential smoothing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 3 of the ways that GANs can fail?

A
  • The discriminator becomes too strong too quickly and the generator ends up not learning anything
  • The generator only learns very specific weaknesses of the discriminator
  • The generator learns only a very small subset of the true data distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What makes demand forecasts more accurate?

A
  • Forecasts are more accurate for aggregated data than for individual items
  • Forecast are more accurate for shorter than longer time periods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do a high and low alpha represent in exponential smoothing?

A
  • a is small -> more weight for the past parameters
  • a is large -> more weight for the present trend
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is post-processing for mitigation?

A
  • Eliminate the discrimination from the final predictions
  • Change the predicted outcomes of classifiers by accessing a hold out set that was not involved in the training of the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a time series?

A

a set of observations measured at specified, usually equal time intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is cyclical variation?

A

Cyclical variations have recurring patterns but with a longer and more erratic time scale compared to seasonal variations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the sliding window model for data streams?

A
  • Keep the most recent k items
  • Upon the arrival of a new item from the stream, discard the oldest item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a data streams model?

A
  • Data enters at a high speed rate
  • The system cannot store the entire steam, but only a small fraction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the roles of the generator and discriminator in GANs?

A
  • The generator tries to mimic examples from a training dataset, which is sampled from the true data distribution. Does this by transforming a random source of noise received as input into a synthetic sample. The objective of the generative network is to increase the error rate of the discriminative network
  • The discriminator receives a sample, but it is not told where the sample comes from. It’s job is to predict whether it is a data sample or a synthetic sample. The objective of the discriminate network is to decrease the binary classification loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do you have to do when performing SMOTE?

A

Must split the data into train/test sets and perform preprocessing on just the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does the semi average method work for finding the trend?

A
  • Divide the data into two equal time ranges
  • Calculate the average of the observations in each of the two time ranges. plot the average at the mid-point of each time range.
  • Draw a straight line between the two points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you remove the seasonal effect?

A
  • Adjust the time series
  • Seasonally adjusted data = actual values / seasonal index *100
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you prove that each element is picked with equal probability in reservoir sampling using mathematical induction?

A

Inductive hypothesis: after n elements, the sample S contains each element seen so far with probability s/n
Inductive step: for elements already in S, the probability that the algorithm keeps it in S is:…n/n+1

So, at time n the tuples in S were there with probability s/n, then at time n+1 the tuple stayed in s with probability n/n+1, so the probability that a tuple is in S at time n+1 is (s/n)*(n/n+1) = s/n+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is statistical fairness?

A
  • A single sensitive (protected) attribute defining demographic groups
  • Find privileged and unprivileged groups based on the sensitive attributes and the decision label
  • Checking parity between demographic groups
  • Cannot always identify hidden unfairness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are reasons for biased data? (4)

A

historical bias in the decision variable
less informative features
biased data collection
imbalanced representation of different demographic groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is simple mean forecasting?

A

Next period’s forecast = average of previously observed data
Yt+1 = (Y1 + Y2 + … = Yt)/t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does least squared liner regression work for finding the trend?

A

Given a set of points (xi, yi), find the best fitting line f(xi) = a + bxi such that SSE = sum (yi - f(xi)^2 is minimised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are 3 problems with a data stream?

A
  • Sampling data from a stream
  • Queries over sliding windows
  • Counting distinct elements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Flajolet-Martin approach?

A
  • Pick a hash function h that maps each of the N elements to at least log2(N) bits
  • For each stream element a, let r(a) be the number of trailing 0s in h(a)
  • r(a) = position of first 1 counting from the right (including 0)
  • Record R = the maximum r(a) seen
  • Estimated number of distinct elements = 2^R
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What do different values of a do for an exponential smoothing forecast?

A
  • A smaller a makes the forecast more stable
  • A larger a makes the forecast more responsive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is exponential smoothing for trend forecasting?

A

Next periods forecast = weighted average of the previous reading and the history
Yt+1 = aYt + (1-a)Y*t
y*t is the prediction for y*t from exponential smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are 5 options for handling imbalanced data?

A
  • Collect more data - difficult in many domains
  • Delete data from the majority class
  • Create synthetic data
  • Adapt your learning algorithm (cost sensitive classification)
  • Random over/under sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is counting distinct elements?

A

Maintain a count of the number of distinct elements seen so far

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is trend?

A

The long term growth or decline of the series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is GANs?

A

Generative adversarial networks (GANs)
-System of two neural networks (generator and discriminator) competing against each other in a zero-sum framework: improvement in one model come at cost to performance of other model
Can learn to draw samples from a model that is similar to the original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is equalised odds difference?

A

EO states that instances from protected and unprotected groups should have equal true positive rate (TPR) and false positive rate (FPR)

  • P1 = P[Y*(x) = 1 | S(x) = G’, Y(x) = 1]
  • P2 = P[Y*(x) = 1 | S(x) = G, Y(x) = 1]
  • P3 = P[Y*(x) = 1 | S(x) = G’, Y(x) = 0]
  • P4 = P[Y*(x) = 1 | S(x) = G, Y(x) = 0]
  • For a classifier to be fair: P1=P2 and P3=P4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is sampling a fixed proportion? What is the problem with it?

A

Naive solution: generate a random integer in [0..9] for each query. store query if the integer is 0, otherwise discard
Problem: as the stream grows. the sample size will also grow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the steps for finding the values of a and b for least squares linear regression?

A
  1. Determine the number of samples n
  2. Allocate mid point in time and replace the time points by their corresponding x values by increasing and decreasing one unit from the mid point accordingly
  3. The dependent variable is “y”
  4. Compute sum(xi^2) and sum(xi*yi), where sum(xi) is 0
  5. Find y = a+bx where b = sum(xi*yi)/sum(xi^2) and a = sum(yi)/n
32
Q

What are 4 common methods for measuring the trend?

A
  • Semi-average
  • Moving average
  • Least-square
  • Exponential smoothing
33
Q

What are 3 other measures for testing forecast accuracy?

A
  • root mean squared error
  • mean absolute error (MAE)
  • tracking signal = sum(yt - y*t)/MAE
34
Q

What is moving average forecasting? What is another name for it?

A

-Also called rolling window
Next periods forecast = simple average of the last k periods
Yt+1 = (Yt-k+1 + Yt-k+2 + … + Yt) / k

35
Q

What is consistency?

A

An individual fairness metric measures how similar the labels are for the similar instances in a dataset based on the k-neighbours of the instance
-Takes values between 0 and 1 where 1 is the optimal

36
Q

How does the moving average method work for finding the trend?

A
  • Based on the premise that if values in a time series are averaged over a sufficient period, the effect of short term variations will be reduced
  • The degree of smoothing can be controlled by selecting the number of cases to be included in the average
  • a 5-year moving average: for one year, get the average of the 2 previous years, current year and two ahead years. this is the average for that year. compute for each year and plot.
37
Q

What can cause the usefulness of trend estimates to decline?

A

high degree of irregularity in original or seasonal-adjusted series or, abrupt change in the time series characteristics of the original data

38
Q

What are the symbols used for defining fairness metrics?

A

D = {X,S,Y} is a dataset
* X: the set of attributes that do not contain sensitive information regarding individuals
* S: the set of sensitive attributes containing sensitive information
* Y/Y*: either 0 or 1 is the original/predicted class label of individuals, which indicates the decision outcome
* G/G’: the values of the unprivileged/privileged group

39
Q

What is a flow series?

A

series which are measures of activities to specific dates e.g. retail, balance of payments

40
Q

What is weighted moving average forecasting?

A

Next periods forecast = weighted average of the last k periods with
Yt+1 = c1Yt-k+1 + … + ckYt
with c1+c2 … + ck = 1

41
Q

What is a problem with SMOTE?

A

might introduce artificial minority class examples too deeply in the majority class space

42
Q

What is random undersampling and its problem?

A

Random under-sampling: randomly delete data points from the majority class - problem with loss of information

43
Q

What is reservoir sampling?

A
  • Store all the first s elements of the stream to S
  • We have seen n-1 elements, now the nth element arrives
  • With probability s/n, keep the nth element, otherwise discard it
  • If we picked the nth element, then it replaces one of the element s in sample S, picked uniformly at random
44
Q

What is the imbalanced data problem?

A

Classifiers try to reduce the overall error (increase the accuracy) so they can be biased towards the majority class

45
Q

Where does bias in algorithms come from?

A

bias in the training datasets

46
Q

What are the 4 components of time series?

A
  • Trend
  • Seasonal variation
  • Cyclical variation
  • Irregular variation
47
Q

What is causal fairness?

A
  • Sensitive attributes should not affect the outcome labels
  • Identify “proxy” attributes that are related to the protected attributes
48
Q

What is the F1 measure?

A

F1 = (2*R*P)/(R+P)

49
Q

What are characteristics of a data streams model?

A
  • Huge columns of continuous data, possible infinite
  • Fast changing and required fast, real-time response
  • Random access is expensive - single scan algorithms
50
Q

What if you do not have the space to maintain the set of elements?

A

estimate the counts in an unbiased way. Accept that the count may have a little error, but limit the probability that the error is large

51
Q

What is the balanced accuracy measure?

A

Balanced accuracy = (sensitivity + specificity)/2

52
Q

What is SMOTE?

A
  • Synthetic minority over-sampling techniques (SMOTE)
  • Creates new data points from the minority class
53
Q

What is random oversampling and its problem?

A

Random oversampling: randomly duplicate data points from the minority class - problem with overfitting and fixed boundaries

54
Q

Which fairness metrics need the original dataset and the model?

A

PP and EO need original and model

DP, DI and consistency can be computed from either the original or the model

55
Q

What is the formula for the MSE for testing forecast accuracy?

A

MSE = sum(yt = y*t)^2 / T - T1 + 1

T=total number of samples in time series
T1 = index of first value to be forecasted
yt = actual value
y*t = predicted value

56
Q

How do you calculate the seasonal index?

A
  • Simple average method
  • Take the average for each period (period mean) over at least 3 years
  • Express each value as an index by comparing it to the average of all periods over the same period of time (divide actual value by period mean to get index)
57
Q

What are the steps of creating data with SMOTE?

A
  1. Take the difference between a sample point and one of its nearest neighbours
  2. Multiply the difference by a random number between 0 and 1 and add it to the feature vector
58
Q

What are 3 problems with GANs?

A
  • Networks are difficult to converge
  • The goal is for generator and discriminator to reach some desired equilibrium but this is rare
  • GANs are yet to converge on large problems
59
Q

What is Disparate Impact (DI) ratio

A

-The ratio between the probability of protected and unprotected groups getting positive or desired outcomes
DI(D) = P[Y(x) = 1 | S(x) = G] / P[Y(x) = 1 | S(x) = G’]
-A dataset or a classifier is considered fair (by law) if its DI-ratio is between 0.8 and 1.25 (1 is the optimal)

60
Q

What is AEO(diff)?

A

AEO(diff) = [(P1-P2) + (P3-P4)] /2

61
Q

What is cost sensitive classification?

A
  • Cost is the penalty associated with an incorrect prediction, goal is to minimise the cost
  • Based on the classifier predicted probabilities
  • Binary traditional case: predict positive if probability is > 0.5
  • Probability threshold can be changed using a cost matrix
  • Classify as positive if: probability of positive > FP/FP+FN
62
Q

What is Demographic Parity (DP) Difference?

A

-The instances in both protected(unprivileged) and unprotected(privileged) groups should have equal probability of being predicted as positive outcome
DP(diff) = P[Y(x) = 1 | S(x) = G’] - P[Y(x) = 1 | S(x) = G] = approx 0
-This metric takes values between 0 and 1 where 0 is the optimal

63
Q

What is a problem with exponential smoothing?

A

Suffers from propagation error

64
Q

What is pre-processing for mitigation?

A

Pre-process the dataset only - try to transform the data so the underlying discrimination is removed

65
Q

What do different values of k do for a moving average forecast?

A
  • A smaller k makes the forecast more responsive
  • A larger k makes the forecast more stable
66
Q

What is imbalanced data?

A

majority of the data coming from one class

67
Q

What is individual fairness?

A
  • Individuals with similar features except the sensitive (protected) attributes must have the same/similar outcomes
  • A similarity/distance measure is needed
  • Requires strong assumptions regarding the relationship between features and the decision label
68
Q

What is predictive parity?

A

* A classifier is fair in terms of predictive parity if the probability that an example is positive in the original dataset given that it is predicted positive from both protected and unprotected groups is the same
* P[Y(x) = 1 | Y*(x) = 1, S(x) = G] = P[Y(x) = 1 | Y*(x) = 1, S(x) = G’]

69
Q

What are 4 examples of pre-processing for mitigation?

A
  • Fairness through unawareness: deletes the sensitive attributes in a dataset
  • Preferential sampling (re-sampling): data objects are sampled with replacement
  • Massaging (relabeling): changes the actual class labels of some of the instances in the training set
  • Reweighing: assigns weights to each instance in the training set
70
Q

What is the Fb measure?

A

Fb = (1+B^2)(R*P)/(B^2*P + R)

71
Q

What is group fairness?

A
  • Define multiple subgroups in a dataset, check parity between these subgroups
  • A statistical constraint is needed
72
Q

What is in-processing for mitigation?

A
  • Adjust/tune the classification algorithm
  • Applied during the model training
73
Q

What is irregular variation?

A
  • An irregular (or random) variation in a time series occurs over varying (usually short) periods
  • It follows no pattern and is by nature unpredictable
  • Irregular variation cannot be explained mathematically
74
Q

How can you identify seasonality in a time series?

A

regularly spaced peaks and troughs

75
Q

What is the forumla for exponential smoothing for finding the trend?

A
  • Past history is used to flatten out short term fluctuations Sx = ay + (1-a)Sx-1
  • Sx = the smoothed value for observation x
  • y = the actual observation at time x
  • Sx-1 = the smoothed value previously calculated for observation at time x-1
  • a = the smoothing constant where 0 <= a <= 1
76
Q

What is seasonal variation?

A

a pattern of change that recurs regularly over time

77
Q

What is naive forecasting?

A

Next periods forecast = previous period’s actual
Yt+1 = Yt