Lecture 7 Flashcards

Prescriptive analytics

1
Q

What is prescriptive analytics?

A

prescriptive analytics is concerned with what
foresight can be obtained after a predictive model has been built.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the general framework for prescriptive analytics?

Although a general framework can be built, the details of the specific methodology are domain- and technology-specific.

A

The general framework is:
1. Identify alternative decisions and objectives
2. Model and simulate alternative decisions
3. Select an optimal decision
4. Perform Analysis

Although a general framework can be built, the details of the specific methodology are domain- and technology-specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does it vary in practice ?

A

In practice, as highlighted by the literature review, there is no single approach to prescriptive analytics.
The developed approaches are often domain-specific and techniques specific.
Usually, the combination of multiple approaches is employed.
The focus of the course is improving the robustness of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some techniques for prescriptive analytics?

A

The techniques are:
* What-If analysis
* Simulation (and ML)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a What-If analysis ?

A

It’s a qualitative approach to determine:
* Exceptional situation
* Reaction
* Likelihood of happening
* Consequence of happening
* Recommendation
For each of the previous 4, the What-If questions have to be developed and answered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Simulation (and ML) ?

A

Machine Learning is by definition an inductive process:
A ML algorithm defines the optimal hypothesis (i.e. predictive model) by learning from the existing data.
On the other hand, in the conventional modelling approach, a human defines a model, tunes its parameter and produces new data (i.e.
simulation results)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you integrate simulations and ML ?

A

Several integrations of ML and simulation models can exist, based on the integration point.
The focus of this course is on:
ML model as generator:
Once a ML has been fit on the existing data and has provided good performances, it can also
be used to make predictions on unseen data (i.e.
counterfactual analysis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the machine learning pipeline in practice?

A

ML in practice goes through the following steps:
1. Raw Data
* Collection
* Download
* Scraping
2. Data Preprocessing
* Data quality (cf. Diagnostic)
* Missing data
* Categorical variables
3. Train-test split
* Single validation
* Cross-validation
4. Model fit
* Fit on training data
* Test on testing data
5. Performane Evaluation
* Performance metric choice
* Evaluation on validation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you make a model more robust in the ML pipeline?

A

You change the pipeline such that:
ML in practice goes through the following steps:
1. Raw Data
* Collection
* Download
* Scraping
2. Data Preprocessing
* Data quality (cf. Diagnostic)
* Missing data
* Categorical variables
* Dimensionality reduction
3. Train-test split
* Single validation
4. Model fit
* Fit on training data
* Test on testing data
* Ensembling
5. Performane Evaluation
* Performance metric choice
* Evaluation on validation data
* Cross-validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is dimensionality reduction ?

A

Dimensionality reduction is the process of transforming the original dataset:
From n columns/features

To k < n columns features
Through feature extraction and feature selection. The original features might or might not be preserved. The original variables are
lost, and replaced by projection/compressed
equivalents.

Feature selection: A subset of the original variables is chosen for the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is feature construction in dimensionality reduction ?

A

Manual feature extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is feature learning in dimensionality reduction ?

A

Automatic feature extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is feature transformation in dimensionality reduction ?

A

Usually denotes less sophisticated transformations over the features, like re-scaling data, bucketing, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is feature engineering in dimensionality reduction ?

A

Sometimes it is used as a synonym for feature extraction, although contrary to extraction, there seems to be a relatively universal consensus that engineering involves not only creativity constructions but pre- processing tasks and naïve transformations as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is feature selection ?

A

A subset of the original variables is chosen for the model. Uses different methods:
Filter methods
Wrapper Methods
Embedded Methods

Ex: Chi squared test, information gain and correlation coefficient scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Filter methods ?

A

Filter feature selection methods apply a statistical measure to assign a scoring to each
feature. The methods are often univariate and consider the feature independently, or with
regard to the dependent variable.

Ex: Chi squared test, information gain and correlation coefficient scores.

17
Q

What are embedded methods?

A

Embedded methods learn which features best contribute to the accuracy of the model while the model is being created. The most common type of embedded feature selection methods are regularization methods. Regularization methods are also called penalization methods that introduce additional constraints into the optimization of a predictive algorithm (such as a regression algorithm) that bias the model toward lower complexity (fewer coefficients).

EX: Elastic Net and Ridge Regression.

18
Q

What are Wrapper methods ?

A

Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model is used to evaluate a combination of features and assign a score based on model accuracy. The search process may be methodical such as a best-first search, it may stochastic such as a random hill-climbing algorithm, or it may use heuristics, like forward and backward passes to add and remove features.

EX: Recursive feature elimination algorithm

19
Q

What is one of the benefits of dimensionality reduction ?

A

**Visualising/Clustering: **
The human perception is limited to three-dimensions.
Dimensionality reduction/feature selection
allows to reduce the number of dimensions in the data to three or less.
Working with limited dimensions can allow:
* Visualization
* Clustering and interpretation of the results

20
Q

How can you assess the stability of the model?

A

In order to assess the stability of the model, it is necessary to test it multiple times, we can do it with cross-validation.
A k-fold-cross-validation process splits the dataset in k parts:
* 1 part is used for testing
* k-1 are used for training
The process is repeated k times to ensure that the whole dataset is covered.
The performance of the model is the average across the k folds

21
Q

What can we do in case the data in unbalanced ?

unbalanced here signifies un-evenly distributed amongst every variable

A

In case of unbalanced data, the data sampling for cross-validation can be performed in order to maintain the original class balance. This cross-validation variant is called stratified cross-validation.

22
Q

What is ensemble modeling?

A

Emsemble modeling combines multiple models in a single predictor. Several parameters can be used for the model combination:
1. Combination can be
* Static
* Variable
2. Models can be
* Homogeneous
* Heterogeneous
3. Combination weights can be
* Fixed
* Time-varying
This is reflected in mulptiple methods:
1. Bagging
2. Stacking
3. Boosting

23
Q

What are the benefits of ensemble modeling?

A

Combining multiple models in a single predictor could help in:
* Reducing the variance/variability of the prediction
* Improving the predictive accuracy

24
Q

What is bagging?

A

Bagging is an ensemble method with static combination for Homogeneous/Hetereogeneous
models with fixed weight combination

Different models trained on different subsets of the data. In the test phase, predict from each weak model and vote their predictions to get the final prediction.

25
Q

What is stacking ?

A

It is a static combination for Homogeneous or Hetereogeneous models, giving a fixed weight combination.

Its a Two-level combination: Weak models are trained with one part of the dataset. A Meta-Model is used to with the outputs of the weak models from the second part of the dataset. To test, the inputs are given to the weak models, output is again collected and given to the meta-model. That is the final prediction.

26
Q

What is boosting?

A

It’s a dynamic combination for Homogeneous models with Time-fixed combination.
The final model is build by incrementally fitting
base models on the errors of the previous models.

27
Q

How do you deal with imbalanced data ?

A

A few techniques exist:
1. Collect more data
2. Changing performance metrics
3. Resampling dataset
4. Generating synthetic samples
5. Test different algorithms
6. Try penalized models
7. Different problem approaches
* Anomaly detection

28
Q

What are Machine Learning Operations (MLOps)

A

MLOps is a core function of Machine Learning engineering, focused on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them. MLOps is a collaborative function, often comprising data scientists, devops engineers, and IT.

29
Q

Paper definition of MLOps

A

MLOps (Machine Learning Operations) is a paradigm, including aspects like best practices, sets of concepts, as well as a development culture when it comes to the end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products. Most of all, it is an engineering practice that leverages three contributing disciplines: machine learning, software engineering (especially DevOps), and data engineering. MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops). Essentially, MLOps aims to facilitate the creation of machine learning products by leveraging these principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model, and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops.