General ML, Evaluation & GDPR Flashcards
What does the statement ‘Machine Learning is an ill-posed problem’ mean?
- An ill-posed problem is a problem for which a unique solution cannot be determined using only the information that is available.
- In terms of ML, the training set represents only a small sample of possible sets of instances in the domain
- A consistent model cannot be found based on the sample training dataset alone.
- If a predictive model is to be useful, it must be able to make predictions for queries that are not present in the data.
- A predictive model that makes the correct predictions for these queries captures the underlying relationship between the descriptive features and target features and is said to generalize well.
ABT
Analytics Base Table
Inductive bias
- necessary for learning to occur
- the set of assumptions that defines the model selection criteria of a machine learning algorithm
- two types (restriction, preference)
Two types of inductive bias
- Restriction bias
2. Preference bias
Restriction Bias
Constrains the set of models that the algorithm will consider during the learning process
Preference Bias
Guides the learning algorithm to prefer certain models over others
No Free Lunch Theorem
There’s no single inductive bias that’s best to use
What is Predictive Data Analytics?
The art of building and using models that make predictions based on patterns extracted from historical data
Applications of predictive data analytics
- price prediction
- dosage prediction
- risk assessment
- propensity modelling (likelihood of an individual or customer to take different actions)
- diagnosis
- document classification
Consistency of a model?
~ memorizing the dataset
- consistency with noise in the data isn’t desirable
- coverage through memorization is never possible in real problems
What is the goal of a predictive model?
A model that generalizes well beyond the dataset and that is invariant to the noise in the datast
What is under-fitting?
Occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target features.
What is over-fitting?
Occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data.
Goldilocks model
Strikes a good balance between under-fitting and over–fitting
- found by using ML algorithms with appropriate inductive biases
2 defining characteristics of ensembles
- Build multiple different models from the same dataset by inducing each model using a modified version of the dataset
- Makes a prediction by aggregating the predictions of the different models in the ensemble
What is an ensemble?
A prediction model that is composed of a set of models is called a model ensemble.
- Rather than creating a single model, they generate a set of models and then make predictions by aggregating the output of these models
Motivation behind ensembles
The idea that a committee of experts working together on a problem are more likely to solve it successfully than a single expert working alone
Bayes Optimal Ensemble
- an ensemble of all the hypotheses in the hypothesis space
- on average, no other ensemble can outperform it
- not possible to practically implement a Bayes Optimal Classifier
- no upper limit (Theory of Large Numbers: As the number of samples gets bigger, your estimate will get better).
- Setting the number of ensembles really really high is going to give you good performance.
2 properties of good ensembles
- Individual models should be strong
2. Correlation between model should be weak
What is the bias/variance trade-off?
TLDR: High bias = underfitting, high-variance = overfitting
The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
Why is it that local minima ain’t so bad after all
In the case of neural nets, local minima are not necessarily that much of a problem.
- Some of the local minima are due to the fact that you can get a functionally identical model by permuting the hidden layer units, or negating the inputs and output weights of the network etc.
- Also if the local minima is only slightly non-optimal, then the difference in performance will be minimal and so it won’t really matter.
- Lastly, and this is an important point, the key problem in fitting a neural network is over-fitting, so aggressively searching for the global minima of the cost function is likely to result in overfitting and a model that performs poorly.
Regularization such as weight decay can help combat overfitting
In practice local minima are rarely a problem with large networks. Discuss.
LeCun, Bengio & Hinton (2015) Nature
Regardless of initial conditions, the system nearly always reaches solutions of very similar quality.
Recent theoretical and empirical results suggest -> not a serious issue
Instead, the landscape is packed with a combinatorially large nuber of saddle points where the gradient is zero, and the surface curves up in most dimensions and curves down in the remainder
Analysis seems to show that saddle points with only a few downward curving directions are present in very large numbers, but almost all of them have very similar values of the objective function. So, it doesn’t matter if it’s get stuck at these points
Recommendations of GDPR paper (Wachter et al. 2018)
- Add a right to explanation to legally binding Article 22
- Clarify ‘significance…envisaged consequences…logic involved’.
- Clarify ‘solely’ for automated processing
- Clarify ‘legal’ or ‘significant effect’ of automated processing
- Clarify ‘necessary for entering or performance of a contract’
- Clarify if a prohibition is meant by ‘right not to be subject to’
- Implement external auditing mechanism for automated decision-making (counterweight to trade secret)
- Support further research to alternative accountability mechanisms
GDPR in a nutshell
- 25th May 2018
- replaces 1995 Data Protection directive
- transparency, security, accountability
- standardizing and strengthening the rights of an individual to data privacy
Regulations surrounding profiling, automated decision making
Why might be DL become illegal according to GDPR paper?
- General data protection
- Prohibition on profiling/automated decision-making
- Right to explanation
GDPR - General Data protection
- Direct personal data
- Indirect personal data
Onus on data controllers to be responsible
Data subjects can request to have info erased, object to direct marketing, inaccuracies corrected, restrict automated processing, data portability
Article 22 - Prohibition on profiling/automated DM
Allowed under 3 conditions
- necessary for contract
- allowed under member state law
- explicit consent
- right not to be subject to… but what is a ‘legal effect’ or ‘similarly significant effect’?
Right to explanation
- system functionality
* specific decision
What’s the verdict - does GDPR mandate a right to explanation?
Consensus is no
Article 22 is vague (maybe intentionally)
Recital 71 - some hope - but just guidance and not legally binding
Broadly speaking, what is the difference between evaluation ML methods in industry versus academia?
Industry - evaluate a model that we would like to deploy for a specific task
Academia - compare ML methods
Reasons for evaluation ML model in industry
- determine which model is most suitable for a task
- estimate performance after deployment
- convince users that model will meet their needs
Reasons for evaluation ML model in academia
- Evaluate the performance of a new method against existing baselines
- Determine best ML approach for a problem
- Perform benchmark experiment
All boils down to comparing multiple approaches on multiple datasets
Key difference between evaluation in industry versus academia
Significance testing
Performances measures for Industry
- macro-averaging vs micro-averaging
- hold-out test
- k-fold CV
Performance measures for academia
Two-fold process:
- Friedman Aligned Rank Test - test if there’s a significant difference between the performances of the algos across the datasets (p < 0.05)
- Nemenyi Test - If there was a significant difference in part 1, find out where the difference exists between algo-pairings
Nemenyi Test -> Significance matrix, and Critical Differences Plot