- necessary for learning to occur - the set of assumptions that defines the model selection criteria of a machine learning algorithm - two types (restriction, preference)

General ML, Evaluation & GDPR Flashcards by Nora Hora

What does the statement ‘Machine Learning is an ill-posed problem’ mean?

An ill-posed problem is a problem for which a unique solution cannot be determined using only the information that is available.
In terms of ML, the training set represents only a small sample of possible sets of instances in the domain
A consistent model cannot be found based on the sample training dataset alone.
If a predictive model is to be useful, it must be able to make predictions for queries that are not present in the data.
A predictive model that makes the correct predictions for these queries captures the underlying relationship between the descriptive features and target features and is said to generalize well.

How well did you know this?

Not at all

Perfectly

ABT

Analytics Base Table

How well did you know this?

Not at all

Perfectly

Inductive bias

necessary for learning to occur
the set of assumptions that defines the model selection criteria of a machine learning algorithm
two types (restriction, preference)

How well did you know this?

Not at all

Perfectly

Two types of inductive bias

Restriction bias

2. Preference bias

How well did you know this?

Not at all

Perfectly

Restriction Bias

Constrains the set of models that the algorithm will consider during the learning process

How well did you know this?

Not at all

Perfectly

Preference Bias

Guides the learning algorithm to prefer certain models over others

How well did you know this?

Not at all

Perfectly

No Free Lunch Theorem

There’s no single inductive bias that’s best to use

How well did you know this?

Not at all

Perfectly

What is Predictive Data Analytics?

The art of building and using models that make predictions based on patterns extracted from historical data

How well did you know this?

Not at all

Perfectly

Applications of predictive data analytics

price prediction
dosage prediction
risk assessment
propensity modelling (likelihood of an individual or customer to take different actions)
diagnosis
document classification

How well did you know this?

Not at all

Perfectly

Consistency of a model?

~ memorizing the dataset

consistency with noise in the data isn’t desirable
coverage through memorization is never possible in real problems

How well did you know this?

Not at all

Perfectly

What is the goal of a predictive model?

A model that generalizes well beyond the dataset and that is invariant to the noise in the datast

How well did you know this?

Not at all

Perfectly

What is under-fitting?

Occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target features.

How well did you know this?

Not at all

Perfectly

What is over-fitting?

Occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data.

How well did you know this?

Not at all

Perfectly

Goldilocks model

Strikes a good balance between under-fitting and over–fitting
- found by using ML algorithms with appropriate inductive biases

How well did you know this?

Not at all

Perfectly

2 defining characteristics of ensembles

Build multiple different models from the same dataset by inducing each model using a modified version of the dataset
Makes a prediction by aggregating the predictions of the different models in the ensemble

How well did you know this?

Not at all

Perfectly

What is an ensemble?

Study These Flashcards

A prediction model that is composed of a set of models is called a model ensemble.
- Rather than creating a single model, they generate a set of models and then make predictions by aggregating the output of these models

Motivation behind ensembles

Study These Flashcards

The idea that a committee of experts working together on a problem are more likely to solve it successfully than a single expert working alone

Bayes Optimal Ensemble

Study These Flashcards

an ensemble of all the hypotheses in the hypothesis space
on average, no other ensemble can outperform it
not possible to practically implement a Bayes Optimal Classifier
no upper limit (Theory of Large Numbers: As the number of samples gets bigger, your estimate will get better).
Setting the number of ensembles really really high is going to give you good performance.

2 properties of good ensembles

Study These Flashcards

Individual models should be strong

2. Correlation between model should be weak

What is the bias/variance trade-off?

Study These Flashcards

TLDR: High bias = underfitting, high-variance = overfitting

The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).

Why is it that local minima ain’t so bad after all

Study These Flashcards

In the case of neural nets, local minima are not necessarily that much of a problem.

Some of the local minima are due to the fact that you can get a functionally identical model by permuting the hidden layer units, or negating the inputs and output weights of the network etc.
Also if the local minima is only slightly non-optimal, then the difference in performance will be minimal and so it won’t really matter.
Lastly, and this is an important point, the key problem in fitting a neural network is over-fitting, so aggressively searching for the global minima of the cost function is likely to result in overfitting and a model that performs poorly.

Regularization such as weight decay can help combat overfitting

In practice local minima are rarely a problem with large networks. Discuss.

Study These Flashcards

LeCun, Bengio & Hinton (2015) Nature

Regardless of initial conditions, the system nearly always reaches solutions of very similar quality.

Recent theoretical and empirical results suggest -> not a serious issue

Instead, the landscape is packed with a combinatorially large nuber of saddle points where the gradient is zero, and the surface curves up in most dimensions and curves down in the remainder

Analysis seems to show that saddle points with only a few downward curving directions are present in very large numbers, but almost all of them have very similar values of the objective function. So, it doesn’t matter if it’s get stuck at these points

Recommendations of GDPR paper (Wachter et al. 2018)

Study These Flashcards

Add a right to explanation to legally binding Article 22
Clarify ‘significance…envisaged consequences…logic involved’.
Clarify ‘solely’ for automated processing
Clarify ‘legal’ or ‘significant effect’ of automated processing
Clarify ‘necessary for entering or performance of a contract’
Clarify if a prohibition is meant by ‘right not to be subject to’
Implement external auditing mechanism for automated decision-making (counterweight to trade secret)
Support further research to alternative accountability mechanisms

GDPR in a nutshell

Study These Flashcards

25th May 2018
replaces 1995 Data Protection directive
transparency, security, accountability
standardizing and strengthening the rights of an individual to data privacy

Regulations surrounding profiling, automated decision making

Why might be DL become illegal according to GDPR paper?

1. General data protection 2. Prohibition on profiling/automated decision-making 3. Right to explanation

GDPR - General Data protection

1. Direct personal data 2. Indirect personal data Onus on data controllers to be responsible Data subjects can request to have info erased, object to direct marketing, inaccuracies corrected, restrict automated processing, data portability

Article 22 - Prohibition on profiling/automated DM

Allowed under 3 conditions 1. necessary for contract 2. allowed under member state law 3. explicit consent - right not to be subject to... but what is a 'legal effect' or 'similarly significant effect'?

Right to explanation

* system functionality | * specific decision

What's the verdict - does GDPR mandate a right to explanation?

Consensus is no Article 22 is vague (maybe intentionally) Recital 71 - some hope - but just guidance and not legally binding

Broadly speaking, what is the difference between evaluation ML methods in industry versus academia?

Industry - evaluate a model that we would like to deploy for a specific task Academia - compare ML methods

Reasons for evaluation ML model in industry

1. determine which model is most suitable for a task 2. estimate performance after deployment 3. convince users that model will meet their needs

Reasons for evaluation ML model in academia

1. Evaluate the performance of a new method against existing baselines 2. Determine best ML approach for a problem 3. Perform benchmark experiment All boils down to comparing multiple approaches on multiple datasets

Key difference between evaluation in industry versus academia

Significance testing

Performances measures for Industry

1. macro-averaging vs micro-averaging 2. hold-out test 3. k-fold CV

Performance measures for academia

Two-fold process: 1. Friedman Aligned Rank Test - test if there's a significant difference between the performances of the algos across the datasets (p < 0.05) 2. Nemenyi Test - If there was a significant difference in part 1, find out where the difference exists between algo-pairings Nemenyi Test -> Significance matrix, and Critical Differences Plot

General ML, Evaluation & GDPR Flashcards

(35 cards)