lecture 5 Flashcards
What is predictive analytics
Predictive analytics is the process of extracting information from large data sets in order to determine trends and patterns that can be used to generate models and predict behaviors of interest.
Prescriptive analytics
Aims at suggesting (prescribing) the best decision options in order to take advantage of the predicted future utilizing large amounts of data (Šikšnys & Pedersen, 2016).
Incorporates the predictive analytics output and utilizes artificial intelligence, optimization algorithms and expert systems in a probabilistic context in order to provide adaptive, automated, constrained, time-dependent and optimal decisions.
Relation between Predictive and prescriptive (predictive-prescriptive split)
There is considerable overlap between the two areas.
difference:
prescriptive depends on predictive. In this course treated as two seperate steps.
Venn diagram in slide shows that Machine Learning / data mining is mainly predictive analytics, but also falls into the prescriptive part.
Probabilistic models is halfway in both.
predictive analystics
statistical analysis
prescriptive analystics
mathematical programming
simulation
logic based models
evolutianry computation
What is AI?
No consensus on a single definition
Thinking Humanly:
Cognitive science/Cognitive modelling
Acting Humanly: Turing test
Thinking Rationally: Logic-based/Deductive Intelligence
Acting Rationally: Rational (trying to achieve the best
solution) agents
Is it more about actual intelligence or perceived
intelligence?
slide 11
Chinese room argument
Is it more about actual intelligence or perceived
intelligence?
Does an AI actually
understand or does it simply
execute an algorithm/set of
rules with (super)human
capacities?
Levels of AI
- narrow AI
- general AI
- super AI
What is narrow AI?
Dedicated to assist with or take over specific tasks
General AI
takes knowledge from one domain, transfers to other domains
Super AI
machines that are an order of magnitude smarter than humans
differences between AI, machine learning, and deep learning
AI: computing systems which are capable of performing tasks that humans are very good at, for example recognizsing objects
ML: the field of AI that applies statistical methods to enable computer systems to learn from the data towards and end goal.
Deep learning: neural networks with several hidden layers.
Machine learning definition
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, *if its performance at tasks *in T, as measured by P, improves with
experience E
When to use:
* classical ML
* Reinforcement learning
* ensembles
* neural networks and deep learning
classical ML
* simple data and clear features
Reinforcement learning
* no data, but we have an environment to interact with
ensembles
* when quality is a real problem
neural networks and deep learning
* complicated data, unclear features, belief in a miracle
Data requirements for Machine learning (taxonomy of machine learning)
- Supervised
- unsupervised
- semisupervised
- reinforcement
With supervised learning, you feed the output of your algorithm into the system (as input, for instance pics of cats and dogs with the answer that a pic of a dog is a dog and a cat is a cat, to train the model). This means that in supervised learning, the machine already knows the output of the algorithm before it starts working on it or learning it. A basic example of this concept would be a student learning a course from an instructor. The student knows what he/she is learning from the course.
With the output of the algorithm known, all that a system needs to do is to work out the steps or process needed to reach from the input to the output. The algorithm is being taught through a training data set that guides the machine.
type of target variable is either:
* continous which results in regression analysis
* catergorical which results in classification.
Examples of these categories formed through classification would include demographic data such as marital status, sex, or age
Even more information if needed
Supervised learning uses a training set to teach models to yield the desired output. This training dataset includes inputs and correct outputs, which allow the model to learn over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Uses labeled data.
examples:
* Image- and object-recognition: Supervised learning algorithms can be used to locate, isolate, and categorize objects out of videos or images, making them useful when applied to various computer vision techniques and imagery analysis.
* Predictive analytics
* Spam detection: Spam detection is another example of a supervised learning model. Using supervised classification algorithms, organizations can train databases to recognize patterns or anomalies in new data to organize spam and non-spam-related correspondences effectively
challenges of supervised learning
* Supervised learning models can require certain levels of expertise to structure accurately.
* Training supervised learning models can be very time intensive.
* Datasets can have a higher likelihood of human error, resulting in algorithms learning incorrectly.
* Unlike unsupervised learning models, supervised learning cannot cluster or classify data on its own.
IBM
Difference between supervised vs. unsupervised learning vs. semi-supervised learning
Unlike supervised learning, unsupervised learning uses unlabeled data. From that data, it discovers patterns that help solve for clustering or association problems. This is particularly useful when subject matter experts are unsure of common properties within a data set. Common clustering algorithms are hierarchical, k-means, and Gaussian mixture models.
Semi-supervised learning occurs when only part of the given input data has been labeled. Unsupervised and semi-supervised learning can be more appealing alternatives as it can be time-consuming and costly to rely on domain expertise to label data appropriately for supervised learning
Unsupervised learning
- Does not use labels
- output is unknown
- far less used than supervised learning
- forms the future behind ML and its possibilities
- machine and computers developing the ability to “teach themselves” is alluding to the process of unsupervised learning.
- no access to concrete datasets
- outcomes of problems are largely unknown
- no reference data at all
Is skippable
example to show difference between supervised and unsupervised learning
consider that we have a digital image that has a variety of colored geometric shapes on it. These geometric shapes needed to be matched into groups according to color and other classification features. For a system that follows supervised learning, this whole process is a bit too simple.
The procedure is extremely straightforward, as you just have to teach the computer all the details pertaining to the figures. You can let the system know that all shapes with four sides are known as squares, and others with eight sides are known as octagons, etc. We can also teach the system to interpret the colors and see how the light being given out is classified.
However, in unsupervised learning, the whole process becomes a little trickier. The algorithm for an unsupervised learning system has the same input data as the one for its supervised counterpart (in our case, digital images showing shapes in different colors).
Once it has the input data, the system learns all it can from the information at hand. In fact, the system works by itself to recognize the problem of classification and also the difference in shapes and colors. With information related to the problem at hand, the unsupervised learning system will then recognize all similar objects, and group them together. The labels that it will give to these objects will be designed by the machine itself. Technically, there are bound to be wrong answers, since there is a certain degree of probability. However, just like how we humans work, the strength of machine learning lies in its ability to recognize mistakes, learn from them, and to eventually make better estimations next time around.
Reinforcement learning
Reinforcement Learning spurs off from the concept of Unsupervised Learning, and gives a high sphere of control to software agents and machines to determine what the ideal behavior within a context can be. This link is formed to maximize the performance of the machine in a way that helps it to grow. Simple feedback that informs the machine about its progress is required here to help the machine learn its behavior.
An agent decides the best action based on the current state of the results
Reinforcement learning vs. supervised learning and unsupervised learning
Reinforcement vs supervised learning
In Supervised Learning we have an external supervisor who has sufficient knowledge of the environment and also shares the learning with a supervisor to form a better understanding and complete the task, but since we have problems where the agent can perform so many different kind of subtasks by itself to achieve the overall objective, the presence of a supervisor is unnecessary and impractical. In the concept of Reinforcement Learning, there is an exemplary reward function, unlike Supervised Learning, that lets the system know about its progress down the right path.
Reinforcement vs unsupervised learning
Reinforcement Learning basically has a mapping structure that guides the machine from input to output. However, Unsupervised Learning has no such features present in it. Unsupervised Learning, the machine focuses on the underlying task of locating the patterns rather than the mapping for progressing towards the end goal.
For example, if the task for the machine is to suggest a good news update to a user, a Reinforcement Learning algorithm will look to get regular feedback from the user in question, and would then through the feedback build a reputable knowledge graph of all news related articles that the person may like. On the contrary, an Unsupervised Learning algorithm will try looking at many other articles that the person has read, similar to this one, and suggest something that matches the user’s preferences.
https://crayondata.ai/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/
Math representation (Taxonomy of Machine Learning)
divided in model-based and instance based
Instance-based: machine learning technique simply compares new instances to the ones they were trained on.
So comparing new data to the training data and based on the training data classifying it.
model-based: try to find a general representation of the relationships in the dataset.
the algorithm chooses an hypothesis, a mathematical representation. Then it determines the parameters of this hyporhesis based on the available data. This will be used to make estimations on new data.
https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/
Classification by Training behaviour (Taxonomy of Machine Learning)
ML techniques cannot have a memory of the entire dataset they were trained on, but iterative adjustments are based on the data it is provided with. Many learning techniques will not be able to adjust on new data an already trained representation while keeping it consistent with its previous training (because there is no memory of the previous data).
batch learning: Learning techniques that require the entire data set for their training.
All the examples must be provided during the traning phase. The “predictor” resulting from the training is then used in production and no more learning occurs. In this setting, if we obtain new examples, we need to train a new model from scratch on the complete enriched data set.
online learning: This learning algorithm can actually adjuts an already trained representation to new data. Unlike batch learning, an online learning technique can be provided with new training examples progressively and changes its representations accordinly, even while being used in production. For many underlying representations, true online learning is not possible. However, depending on the formulation, we can often find a pseudo online algorithm based on recursive algorithms. In this case, the new predictor depends on the current best predictor and all the previous examples (already learnt).
https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/
Classification by Task Type (Machine learning taxonomy by usage or goal)
- Regression
- Classification
- Clustering
- Association Rule learning
- Decision making
- Blind source seperation
- Dimensinality reduction