General ML, Evaluation & GDPR Flashcards

1
Q

What does the statement ‘Machine Learning is an ill-posed problem’ mean?

A
  • An ill-posed problem is a problem for which a unique solution cannot be determined using only the information that is available.
  • In terms of ML, the training set represents only a small sample of possible sets of instances in the domain
  • A consistent model cannot be found based on the sample training dataset alone.
  • If a predictive model is to be useful, it must be able to make predictions for queries that are not present in the data.
  • A predictive model that makes the correct predictions for these queries captures the underlying relationship between the descriptive features and target features and is said to generalize well.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ABT

A

Analytics Base Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inductive bias

A
  • necessary for learning to occur
  • the set of assumptions that defines the model selection criteria of a machine learning algorithm
  • two types (restriction, preference)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two types of inductive bias

A
  1. Restriction bias

2. Preference bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Restriction Bias

A

Constrains the set of models that the algorithm will consider during the learning process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Preference Bias

A

Guides the learning algorithm to prefer certain models over others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

No Free Lunch Theorem

A

There’s no single inductive bias that’s best to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Predictive Data Analytics?

A

The art of building and using models that make predictions based on patterns extracted from historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Applications of predictive data analytics

A
  • price prediction
  • dosage prediction
  • risk assessment
  • propensity modelling (likelihood of an individual or customer to take different actions)
  • diagnosis
  • document classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consistency of a model?

A

~ memorizing the dataset

  • consistency with noise in the data isn’t desirable
  • coverage through memorization is never possible in real problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the goal of a predictive model?

A

A model that generalizes well beyond the dataset and that is invariant to the noise in the datast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is under-fitting?

A

Occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is over-fitting?

A

Occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Goldilocks model

A

Strikes a good balance between under-fitting and over–fitting
- found by using ML algorithms with appropriate inductive biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 defining characteristics of ensembles

A
  1. Build multiple different models from the same dataset by inducing each model using a modified version of the dataset
  2. Makes a prediction by aggregating the predictions of the different models in the ensemble
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an ensemble?

A

A prediction model that is composed of a set of models is called a model ensemble.
- Rather than creating a single model, they generate a set of models and then make predictions by aggregating the output of these models

17
Q

Motivation behind ensembles

A

The idea that a committee of experts working together on a problem are more likely to solve it successfully than a single expert working alone

18
Q

Bayes Optimal Ensemble

A
  • an ensemble of all the hypotheses in the hypothesis space
  • on average, no other ensemble can outperform it
  • not possible to practically implement a Bayes Optimal Classifier
  • no upper limit (Theory of Large Numbers: As the number of samples gets bigger, your estimate will get better).
  • Setting the number of ensembles really really high is going to give you good performance.
19
Q

2 properties of good ensembles

A
  1. Individual models should be strong

2. Correlation between model should be weak

20
Q

What is the bias/variance trade-off?

A

TLDR: High bias = underfitting, high-variance = overfitting

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

21
Q

Why is it that local minima ain’t so bad after all

A

In the case of neural nets, local minima are not necessarily that much of a problem.

  1. Some of the local minima are due to the fact that you can get a functionally identical model by permuting the hidden layer units, or negating the inputs and output weights of the network etc.
  2. Also if the local minima is only slightly non-optimal, then the difference in performance will be minimal and so it won’t really matter.
  3. Lastly, and this is an important point, the key problem in fitting a neural network is over-fitting, so aggressively searching for the global minima of the cost function is likely to result in overfitting and a model that performs poorly.

Regularization such as weight decay can help combat overfitting

22
Q

In practice local minima are rarely a problem with large networks. Discuss.

A

LeCun, Bengio & Hinton (2015) Nature

Regardless of initial conditions, the system nearly always reaches solutions of very similar quality.

Recent theoretical and empirical results suggest -> not a serious issue

Instead, the landscape is packed with a combinatorially large nuber of saddle points where the gradient is zero, and the surface curves up in most dimensions and curves down in the remainder

Analysis seems to show that saddle points with only a few downward curving directions are present in very large numbers, but almost all of them have very similar values of the objective function. So, it doesn’t matter if it’s get stuck at these points

23
Q

Recommendations of GDPR paper (Wachter et al. 2018)

A
  1. Add a right to explanation to legally binding Article 22
  2. Clarify ‘significance…envisaged consequences…logic involved’.
  3. Clarify ‘solely’ for automated processing
  4. Clarify ‘legal’ or ‘significant effect’ of automated processing
  5. Clarify ‘necessary for entering or performance of a contract’
  6. Clarify if a prohibition is meant by ‘right not to be subject to’
  7. Implement external auditing mechanism for automated decision-making (counterweight to trade secret)
  8. Support further research to alternative accountability mechanisms
24
Q

GDPR in a nutshell

A
  • 25th May 2018
  • replaces 1995 Data Protection directive
  • transparency, security, accountability
  • standardizing and strengthening the rights of an individual to data privacy

Regulations surrounding profiling, automated decision making

25
Q

Why might be DL become illegal according to GDPR paper?

A
  1. General data protection
  2. Prohibition on profiling/automated decision-making
  3. Right to explanation
26
Q

GDPR - General Data protection

A
  1. Direct personal data
  2. Indirect personal data

Onus on data controllers to be responsible

Data subjects can request to have info erased, object to direct marketing, inaccuracies corrected, restrict automated processing, data portability

27
Q

Article 22 - Prohibition on profiling/automated DM

A

Allowed under 3 conditions

  1. necessary for contract
  2. allowed under member state law
  3. explicit consent
  • right not to be subject to… but what is a ‘legal effect’ or ‘similarly significant effect’?
28
Q

Right to explanation

A
  • system functionality

* specific decision

29
Q

What’s the verdict - does GDPR mandate a right to explanation?

A

Consensus is no
Article 22 is vague (maybe intentionally)
Recital 71 - some hope - but just guidance and not legally binding

30
Q

Broadly speaking, what is the difference between evaluation ML methods in industry versus academia?

A

Industry - evaluate a model that we would like to deploy for a specific task

Academia - compare ML methods

31
Q

Reasons for evaluation ML model in industry

A
  1. determine which model is most suitable for a task
  2. estimate performance after deployment
  3. convince users that model will meet their needs
32
Q

Reasons for evaluation ML model in academia

A
  1. Evaluate the performance of a new method against existing baselines
  2. Determine best ML approach for a problem
  3. Perform benchmark experiment

All boils down to comparing multiple approaches on multiple datasets

33
Q

Key difference between evaluation in industry versus academia

A

Significance testing

34
Q

Performances measures for Industry

A
  1. macro-averaging vs micro-averaging
  2. hold-out test
  3. k-fold CV
35
Q

Performance measures for academia

A

Two-fold process:

  1. Friedman Aligned Rank Test - test if there’s a significant difference between the performances of the algos across the datasets (p < 0.05)
  2. Nemenyi Test - If there was a significant difference in part 1, find out where the difference exists between algo-pairings

Nemenyi Test -> Significance matrix, and Critical Differences Plot