lecture 5 Flashcards
What is predictive analytics
Predictive analytics is the process of extracting information from large data sets in order to determine trends and patterns that can be used to generate models and predict behaviors of interest.
Prescriptive analytics
Aims at suggesting (prescribing) the best decision options in order to take advantage of the predicted future utilizing large amounts of data (Šikšnys & Pedersen, 2016).
Incorporates the predictive analytics output and utilizes artificial intelligence, optimization algorithms and expert systems in a probabilistic context in order to provide adaptive, automated, constrained, time-dependent and optimal decisions.
Relation between Predictive and prescriptive (predictive-prescriptive split)
There is considerable overlap between the two areas.
difference:
prescriptive depends on predictive. In this course treated as two seperate steps.
Venn diagram in slide shows that Machine Learning / data mining is mainly predictive analytics, but also falls into the prescriptive part.
Probabilistic models is halfway in both.
predictive analystics
statistical analysis
prescriptive analystics
mathematical programming
simulation
logic based models
evolutianry computation
What is AI?
No consensus on a single definition
Thinking Humanly:
Cognitive science/Cognitive modelling
Acting Humanly: Turing test
Thinking Rationally: Logic-based/Deductive Intelligence
Acting Rationally: Rational (trying to achieve the best
solution) agents
Is it more about actual intelligence or perceived
intelligence?
slide 11
Chinese room argument
Is it more about actual intelligence or perceived
intelligence?
Does an AI actually
understand or does it simply
execute an algorithm/set of
rules with (super)human
capacities?
Levels of AI
- narrow AI
- general AI
- super AI
What is narrow AI?
Dedicated to assist with or take over specific tasks
General AI
takes knowledge from one domain, transfers to other domains
Super AI
machines that are an order of magnitude smarter than humans
differences between AI, machine learning, and deep learning
AI: computing systems which are capable of performing tasks that humans are very good at, for example recognizsing objects
ML: the field of AI that applies statistical methods to enable computer systems to learn from the data towards and end goal.
Deep learning: neural networks with several hidden layers.
Machine learning definition
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, *if its performance at tasks *in T, as measured by P, improves with
experience E
When to use:
* classical ML
* Reinforcement learning
* ensembles
* neural networks and deep learning
classical ML
* simple data and clear features
Reinforcement learning
* no data, but we have an environment to interact with
ensembles
* when quality is a real problem
neural networks and deep learning
* complicated data, unclear features, belief in a miracle
Data requirements for Machine learning (taxonomy of machine learning)
- Supervised
- unsupervised
- semisupervised
- reinforcement
With supervised learning, you feed the output of your algorithm into the system (as input, for instance pics of cats and dogs with the answer that a pic of a dog is a dog and a cat is a cat, to train the model). This means that in supervised learning, the machine already knows the output of the algorithm before it starts working on it or learning it. A basic example of this concept would be a student learning a course from an instructor. The student knows what he/she is learning from the course.
With the output of the algorithm known, all that a system needs to do is to work out the steps or process needed to reach from the input to the output. The algorithm is being taught through a training data set that guides the machine.
type of target variable is either:
* continous which results in regression analysis
* catergorical which results in classification.
Examples of these categories formed through classification would include demographic data such as marital status, sex, or age
Even more information if needed
Supervised learning uses a training set to teach models to yield the desired output. This training dataset includes inputs and correct outputs, which allow the model to learn over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Uses labeled data.
examples:
* Image- and object-recognition: Supervised learning algorithms can be used to locate, isolate, and categorize objects out of videos or images, making them useful when applied to various computer vision techniques and imagery analysis.
* Predictive analytics
* Spam detection: Spam detection is another example of a supervised learning model. Using supervised classification algorithms, organizations can train databases to recognize patterns or anomalies in new data to organize spam and non-spam-related correspondences effectively
challenges of supervised learning
* Supervised learning models can require certain levels of expertise to structure accurately.
* Training supervised learning models can be very time intensive.
* Datasets can have a higher likelihood of human error, resulting in algorithms learning incorrectly.
* Unlike unsupervised learning models, supervised learning cannot cluster or classify data on its own.
IBM
Difference between supervised vs. unsupervised learning vs. semi-supervised learning
Unlike supervised learning, unsupervised learning uses unlabeled data. From that data, it discovers patterns that help solve for clustering or association problems. This is particularly useful when subject matter experts are unsure of common properties within a data set. Common clustering algorithms are hierarchical, k-means, and Gaussian mixture models.
Semi-supervised learning occurs when only part of the given input data has been labeled. Unsupervised and semi-supervised learning can be more appealing alternatives as it can be time-consuming and costly to rely on domain expertise to label data appropriately for supervised learning
Unsupervised learning
- Does not use labels
- output is unknown
- far less used than supervised learning
- forms the future behind ML and its possibilities
- machine and computers developing the ability to “teach themselves” is alluding to the process of unsupervised learning.
- no access to concrete datasets
- outcomes of problems are largely unknown
- no reference data at all
Is skippable
example to show difference between supervised and unsupervised learning
consider that we have a digital image that has a variety of colored geometric shapes on it. These geometric shapes needed to be matched into groups according to color and other classification features. For a system that follows supervised learning, this whole process is a bit too simple.
The procedure is extremely straightforward, as you just have to teach the computer all the details pertaining to the figures. You can let the system know that all shapes with four sides are known as squares, and others with eight sides are known as octagons, etc. We can also teach the system to interpret the colors and see how the light being given out is classified.
However, in unsupervised learning, the whole process becomes a little trickier. The algorithm for an unsupervised learning system has the same input data as the one for its supervised counterpart (in our case, digital images showing shapes in different colors).
Once it has the input data, the system learns all it can from the information at hand. In fact, the system works by itself to recognize the problem of classification and also the difference in shapes and colors. With information related to the problem at hand, the unsupervised learning system will then recognize all similar objects, and group them together. The labels that it will give to these objects will be designed by the machine itself. Technically, there are bound to be wrong answers, since there is a certain degree of probability. However, just like how we humans work, the strength of machine learning lies in its ability to recognize mistakes, learn from them, and to eventually make better estimations next time around.
Reinforcement learning
Reinforcement Learning spurs off from the concept of Unsupervised Learning, and gives a high sphere of control to software agents and machines to determine what the ideal behavior within a context can be. This link is formed to maximize the performance of the machine in a way that helps it to grow. Simple feedback that informs the machine about its progress is required here to help the machine learn its behavior.
An agent decides the best action based on the current state of the results
Reinforcement learning vs. supervised learning and unsupervised learning
Reinforcement vs supervised learning
In Supervised Learning we have an external supervisor who has sufficient knowledge of the environment and also shares the learning with a supervisor to form a better understanding and complete the task, but since we have problems where the agent can perform so many different kind of subtasks by itself to achieve the overall objective, the presence of a supervisor is unnecessary and impractical. In the concept of Reinforcement Learning, there is an exemplary reward function, unlike Supervised Learning, that lets the system know about its progress down the right path.
Reinforcement vs unsupervised learning
Reinforcement Learning basically has a mapping structure that guides the machine from input to output. However, Unsupervised Learning has no such features present in it. Unsupervised Learning, the machine focuses on the underlying task of locating the patterns rather than the mapping for progressing towards the end goal.
For example, if the task for the machine is to suggest a good news update to a user, a Reinforcement Learning algorithm will look to get regular feedback from the user in question, and would then through the feedback build a reputable knowledge graph of all news related articles that the person may like. On the contrary, an Unsupervised Learning algorithm will try looking at many other articles that the person has read, similar to this one, and suggest something that matches the user’s preferences.
https://crayondata.ai/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/
Math representation (Taxonomy of Machine Learning)
divided in model-based and instance based
Instance-based: machine learning technique simply compares new instances to the ones they were trained on.
So comparing new data to the training data and based on the training data classifying it.
model-based: try to find a general representation of the relationships in the dataset.
the algorithm chooses an hypothesis, a mathematical representation. Then it determines the parameters of this hyporhesis based on the available data. This will be used to make estimations on new data.
https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/
Classification by Training behaviour (Taxonomy of Machine Learning)
ML techniques cannot have a memory of the entire dataset they were trained on, but iterative adjustments are based on the data it is provided with. Many learning techniques will not be able to adjust on new data an already trained representation while keeping it consistent with its previous training (because there is no memory of the previous data).
batch learning: Learning techniques that require the entire data set for their training.
All the examples must be provided during the traning phase. The “predictor” resulting from the training is then used in production and no more learning occurs. In this setting, if we obtain new examples, we need to train a new model from scratch on the complete enriched data set.
online learning: This learning algorithm can actually adjuts an already trained representation to new data. Unlike batch learning, an online learning technique can be provided with new training examples progressively and changes its representations accordinly, even while being used in production. For many underlying representations, true online learning is not possible. However, depending on the formulation, we can often find a pseudo online algorithm based on recursive algorithms. In this case, the new predictor depends on the current best predictor and all the previous examples (already learnt).
https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/
Classification by Task Type (Machine learning taxonomy by usage or goal)
- Regression
- Classification
- Clustering
- Association Rule learning
- Decision making
- Blind source seperation
- Dimensinality reduction
Regression
𝑌 = 𝑓 (𝑋)
The values of 𝑌 are determined by a human
𝑌 ∈ ℝ is a continuous variable
𝑓 is learned from the data through ML
Regression tries to find the value of a property of a phenomenon depending on the values of other properties or instances of the same kind.
Regression typically falls under supervised learning.
For example, suppose an ice cream seller wants to predict its incomes based on temparature forcasts. We would be learning the (model and) parameters of a regression if we were to try to create a software package for this requirement.
Classification
𝑌 = 𝑓(𝑋)
The values of 𝑌 are determined by a human
𝑌 ∈ { 𝐶! , … , 𝐶” } is a discrete variable
𝐶! = Triangle
𝐶# = Circle
𝑓 is learned from the data through ML
Classification tries to find boundaries in the dataset so as to seperate the elements into a number of classes known (or defined) before the training.
Classification typicaly fall under supervised learning. However there exist unsupervised classification, like anomaly detection or outliers detection.
clustering
The real values of 𝑌 are unknown
The ML algorithm tries to identify existing patterns in the data (without prior supervision)
Clustering tries to group observations such that elements belonging to the same group (or cluster) are more similar - according to some similarity measure - and thoses belonging to different groups are more dissimilar. Clustering typically is an unsupervised learning task.
Baseline vs. State-of-the-art-model
Baseline/Benchmark
* Simple model
* Easy/quick to fit
* Reference point for performance analysis
State-of-the-art model
* Usually very complex model
* Costly/optimized fit
* Best possible performances
Supervised machine learning for regression
Linear Regression
Artificial Neural Networks
Deep Artificial Neural Networks
Support Vector Regression (SVR)
K-Nearest Neighbours (k-NN)
Linear Regression
Linear Regression
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation:
Model-based : 𝑌 = 𝛽𝑋 + ε
Task: Regression
For Classification, the equivalent model is
Logistic Regression
Artificial Neural Networks
Dataset requirement :
Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)
Data provisioning: Batch/Online
Model representation: Model-based
Task: Regression/Classification Ensemble model
Deep Artificial Neural Networks
Dataset requirement :
▪ Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)
Data provisioning: Batch/Online
Model representation: Model-
based
Task: Regression/Classification
Support Vector Regression (SVR)
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation: Model-
based : 𝑌 = 𝐾(𝛽𝑋) + ε
Task: Classification
For Regression, the equivalent model is Support Vector Regression
K-Nearest Neighbours (k-NN)
Dataset requirement:
Supervised
Data provisioning:
Batch/Online
Model representation:
Model-based
Task:
Classification/Regression
Regression -> Mean
Classification -> Majority vote
Supervised Machine Learning for classification
- Naïve Bayes
- Logistic Regression
- Support Vector Machines (SVM)
- Decision Tree
- Random forest
- Artificial Neural Networks
Naïve Bayes
Dataset requirement :
Supervised
Data provisioning:
Batch
Model representation:
Model-based
Task: Classification
Logistic Regression
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation:
Model-based : 𝑌 = 𝛽𝑋 + ε
Task: Classification
For Regression, the equivalent model is Linear Regression
Support Vector Machines (SVM)
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation: Model-based : 𝑌 = 𝐾(𝛽𝑋) + ε
Task: Classification
For Regression, the equivalent model is Support Vector Regression
Decision Tree
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation: Instance-based
Task: Regression/Classification
Regression VS Classification
Decision Tree
Random forest
Dataset requirement :
Supervised
Data provisioning: Batch
Model representation: Instance-based
Task: Regression/Classification
Ensemble model
Artificial Neural Networks
Dataset requirement :
Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)
Data provisioning: Batch/Online
Model representation: Model-based
Task: Regression/Classification
Ensemble model
Unsupervised Machine Learning
- K-Means Clustering
- Hierarchical clustering
- And many more…
many more:
Dimensionality Reduction
* PCA
* t-SNE
* Autoencoders
Clustering
* DBSCAN
* Self-organizing maps
Reinforcement Learning
* Q-Learning
* Deep Q-Learning
…
K-Means Clustering
Dataset requirement: Unsupervised
Data provisioning: Batch
Model representation: Instance-based
Task: Clustering/pattern recognition
N.B. : As clustering is unsupervised, multiple solutions can be found!
Hierarchical clustering
Dataset requirement: Unsupervised
Data provisioning: Batch
Model representation: Instance-based
Task: Clustering/pattern recognition
Machine learning in practice - pipeline
raw data
* collection
* download
* scraping
Data preprocessing
* Data quality (cf. diagnostic)
* missing data
* categorical variables
Train-test split
* single validation
* cross validation
model fit
* fit on training data
* test on testing data
performance evaluation
* performance metric choice
* evaluation on validation data
Splitting data
Data is split for three different uses:
* trees of different depths are fit to the training data
* their performance is evaluated on the validation set (the lower the validation error the better)
* and a final estimate of model performance is computed on the test set
Splitting data and vocabulary
Feature: With respect to a dataset, a feature represents an attribute and value combination. Color is an attribute. “Color is blue” is a feature (blue is one of the values color can have).
target: target variable, also known as a dependent variable, is the outcome we aim to predict or explain using our model. It is the variable that we want to estimate or classify based on the available data.
sample: a row, one instance in a dataset, so an answer for all the features (and thus variables)
Training Set: A set of observations used to generate machine learning models.
Test Set: A set of observations used at the end of model training and validation to assess the predictive power of your model. How generalizable is your model to unseen data?
Categorical data preprocessing
ordinal
one-hot-encoding
Use One-Hot Encoding: When dealing with nominal categorical variables that lack any inherent order.
Use Ordinal Encoding: When you have categorical variables with a clear ordinal relationship and the order between categories holds valuable information.
One-hot-encoding
transforms categorical variables into a binary matrix where each category is represented as a column, and each instance is marked with a ‘1’ in the corresponding column and ‘0’ in all other columns. (so for instance, three values: red, green and yellow, then 3 columns, if it is red then a 1 in the red column a 0 in the others.)
advantages:
1. Preservation of Information: One-hot encoding preserves the uniqueness of each category. It ensures that the algorithm does not assume any ordinal relationship among the categories.
2. Lack of Bias: Since each category is represented independently, one-hot encoding prevents introducing unintended biases based on the order of categories.
3. Suitable for Most Algorithms: One-hot encoded data is widely accepted by various machine learning algorithms, such as decision trees, random forests, and neural networks
limitations:
1. Dimensionality: One-hot encoding can significantly increase the dimensionality of the dataset, especially when dealing with categorical variables with many unique categories. This can lead to the curse of dimensionality and negatively impact model performance.
2. Loss of Order Information: One-hot encoding discards any inherent order that might exist among categories, which can be crucial in some scenarios.
Ordinal Encoding
Ordinal encoding is a technique that assigns a unique integer value to each category based on their order or rank. It is suitable for categorical variables that exhibit a clear ordinal relationship, where one category is greater or lesser than another. (for instance: flight ticket, first, second or a third class)
advantages:
1. Efficiency in Dimensionality: Ordinal encoding does not inflate the dataset’s dimensionality like one-hot encoding does. It replaces categorical values with integers, saving space and computation time.
2. Retains Order Information: This technique preserves the ordinal information that exists among categories, allowing the algorithm to leverage this information if it is relevant to the problem.
limitations:
1. Assumption of Equal Steps: Ordinal encoding assumes equal intervals between categories, which might not always be the case in real-world scenarios.
2. Potential Misrepresentation: If the assigned integer values do not accurately reflect the ordinal relationships, the encoded data might mislead the algorithm.
Missing data preprocessing
Case deletion
Missing data imputation
Approcahes that take into account data distribution
Missing data imputation
Generally replace the missing quantitative values using Mean/Median and when it comes to categorical or qualitative data, we use Mode to impute the missing data.
Case deletion
List Wise Deletion: If we have missing values in the row then, delete the entire row. So, here we get some data loss. But to avoid this, we can use the Pairwise deletion method.
Pair Wise Deletion: We find the correlation matrix here. If the feature is highly correlated with the target variable, then we use some different imputation methods to deal with missing values. But, if the feature is not highly correlated with the target variable, then we delete the entire column.
Precision
exactness of model
True positive / (true positive + false positive)
Accuracy
percentage correct predictions
(true positive + true negative)
/
(tp + fn + fp + tn)
Recall
Completeness of model
TP / (TP+FN)
F1 Score
Combines precision and recall
(precision * recall)
\ *2
(precision + recall)
(so the fraction and then time 2)