General ML, Evaluation & GDPR Flashcards
What does the statement ‘Machine Learning is an ill-posed problem’ mean?
- An ill-posed problem is a problem for which a unique solution cannot be determined using only the information that is available.
- In terms of ML, the training set represents only a small sample of possible sets of instances in the domain
- A consistent model cannot be found based on the sample training dataset alone.
- If a predictive model is to be useful, it must be able to make predictions for queries that are not present in the data.
- A predictive model that makes the correct predictions for these queries captures the underlying relationship between the descriptive features and target features and is said to generalize well.
ABT
Analytics Base Table
Inductive bias
- necessary for learning to occur
- the set of assumptions that defines the model selection criteria of a machine learning algorithm
- two types (restriction, preference)
Two types of inductive bias
- Restriction bias
2. Preference bias
Restriction Bias
Constrains the set of models that the algorithm will consider during the learning process
Preference Bias
Guides the learning algorithm to prefer certain models over others
No Free Lunch Theorem
There’s no single inductive bias that’s best to use
What is Predictive Data Analytics?
The art of building and using models that make predictions based on patterns extracted from historical data
Applications of predictive data analytics
- price prediction
- dosage prediction
- risk assessment
- propensity modelling (likelihood of an individual or customer to take different actions)
- diagnosis
- document classification
Consistency of a model?
~ memorizing the dataset
- consistency with noise in the data isn’t desirable
- coverage through memorization is never possible in real problems
What is the goal of a predictive model?
A model that generalizes well beyond the dataset and that is invariant to the noise in the datast
What is under-fitting?
Occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target features.
What is over-fitting?
Occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data.
Goldilocks model
Strikes a good balance between under-fitting and over–fitting
- found by using ML algorithms with appropriate inductive biases
2 defining characteristics of ensembles
- Build multiple different models from the same dataset by inducing each model using a modified version of the dataset
- Makes a prediction by aggregating the predictions of the different models in the ensemble