Week 9: Time Series, Imbalanced Data & Fairness Flashcards
What is a stock series?
measures of activity at a point in time e.g. employment
What is sampling a fixed sample size?
- Maintain a sample size S of exactly s samples
- Suppose at time n we have n items
- Each sample is in the sample S with equal probability s/n
What are 5 methods for forecasting the trend?
- Naive forecasting
- Simple mean
- Moving average
- Weighted moving average
- Exponential smoothing
What are 3 of the ways that GANs can fail?
- The discriminator becomes too strong too quickly and the generator ends up not learning anything
- The generator only learns very specific weaknesses of the discriminator
- The generator learns only a very small subset of the true data distribution
What makes demand forecasts more accurate?
- Forecasts are more accurate for aggregated data than for individual items
- Forecast are more accurate for shorter than longer time periods
What do a high and low alpha represent in exponential smoothing?
- a is small -> more weight for the past parameters
- a is large -> more weight for the present trend
What is post-processing for mitigation?
- Eliminate the discrimination from the final predictions
- Change the predicted outcomes of classifiers by accessing a hold out set that was not involved in the training of the model
What is a time series?
a set of observations measured at specified, usually equal time intervals
What is cyclical variation?
Cyclical variations have recurring patterns but with a longer and more erratic time scale compared to seasonal variations
What is the sliding window model for data streams?
- Keep the most recent k items
- Upon the arrival of a new item from the stream, discard the oldest item
What is a data streams model?
- Data enters at a high speed rate
- The system cannot store the entire steam, but only a small fraction
What are the roles of the generator and discriminator in GANs?
- The generator tries to mimic examples from a training dataset, which is sampled from the true data distribution. Does this by transforming a random source of noise received as input into a synthetic sample. The objective of the generative network is to increase the error rate of the discriminative network
- The discriminator receives a sample, but it is not told where the sample comes from. It’s job is to predict whether it is a data sample or a synthetic sample. The objective of the discriminate network is to decrease the binary classification loss
What do you have to do when performing SMOTE?
Must split the data into train/test sets and perform preprocessing on just the training data
How does the semi average method work for finding the trend?
- Divide the data into two equal time ranges
- Calculate the average of the observations in each of the two time ranges. plot the average at the mid-point of each time range.
- Draw a straight line between the two points
How do you remove the seasonal effect?
- Adjust the time series
- Seasonally adjusted data = actual values / seasonal index *100
How do you prove that each element is picked with equal probability in reservoir sampling using mathematical induction?
Inductive hypothesis: after n elements, the sample S contains each element seen so far with probability s/n
Inductive step: for elements already in S, the probability that the algorithm keeps it in S is:…n/n+1
So, at time n the tuples in S were there with probability s/n, then at time n+1 the tuple stayed in s with probability n/n+1, so the probability that a tuple is in S at time n+1 is (s/n)*(n/n+1) = s/n+1
What is statistical fairness?
- A single sensitive (protected) attribute defining demographic groups
- Find privileged and unprivileged groups based on the sensitive attributes and the decision label
- Checking parity between demographic groups
- Cannot always identify hidden unfairness
What are reasons for biased data? (4)
historical bias in the decision variable
less informative features
biased data collection
imbalanced representation of different demographic groups
What is simple mean forecasting?
Next period’s forecast = average of previously observed data
Yt+1 = (Y1 + Y2 + … = Yt)/t
How does least squared liner regression work for finding the trend?
Given a set of points (xi, yi), find the best fitting line f(xi) = a + bxi such that SSE = sum (yi - f(xi)^2 is minimised
What are 3 problems with a data stream?
- Sampling data from a stream
- Queries over sliding windows
- Counting distinct elements
What is the Flajolet-Martin approach?
- Pick a hash function h that maps each of the N elements to at least log2(N) bits
- For each stream element a, let r(a) be the number of trailing 0s in h(a)
- r(a) = position of first 1 counting from the right (including 0)
- Record R = the maximum r(a) seen
- Estimated number of distinct elements = 2^R
What do different values of a do for an exponential smoothing forecast?
- A smaller a makes the forecast more stable
- A larger a makes the forecast more responsive
What is exponential smoothing for trend forecasting?
Next periods forecast = weighted average of the previous reading and the history
Yt+1 = aYt + (1-a)Y*t
y*t is the prediction for y*t from exponential smoothing
What are 5 options for handling imbalanced data?
- Collect more data - difficult in many domains
- Delete data from the majority class
- Create synthetic data
- Adapt your learning algorithm (cost sensitive classification)
- Random over/under sampling
What is counting distinct elements?
Maintain a count of the number of distinct elements seen so far
What is trend?
The long term growth or decline of the series
What is GANs?
Generative adversarial networks (GANs)
-System of two neural networks (generator and discriminator) competing against each other in a zero-sum framework: improvement in one model come at cost to performance of other model
Can learn to draw samples from a model that is similar to the original data
What is equalised odds difference?
EO states that instances from protected and unprotected groups should have equal true positive rate (TPR) and false positive rate (FPR)
- P1 = P[Y*(x) = 1 | S(x) = G’, Y(x) = 1]
- P2 = P[Y*(x) = 1 | S(x) = G, Y(x) = 1]
- P3 = P[Y*(x) = 1 | S(x) = G’, Y(x) = 0]
- P4 = P[Y*(x) = 1 | S(x) = G, Y(x) = 0]
- For a classifier to be fair: P1=P2 and P3=P4
What is sampling a fixed proportion? What is the problem with it?
Naive solution: generate a random integer in [0..9] for each query. store query if the integer is 0, otherwise discard
Problem: as the stream grows. the sample size will also grow