Domain 4 - Methodology Selection Flashcards
Three categories of analytical models
Descriptive
Predictive
Prescriptive
Prescriptive Techniques
Optimization
Simulation-Optimization
Stochastic Optimization
Types of optimization models
Linear programming, Integer programming, Nonlinear programming, Mixed integer programming, Network optimization, Dynamic Programming, Metaheuristics
Prescriptive models are used to…
provide new ways to improve certain types of performance as agreed upon with the customer and documented in the business and analytics problem statements.
Ask: “what is the best action/outcome?”
Predictive models are used to…
predict future trends and possibilities and explain past relationships. Ask: “what could happen?”
Predictive techniques include…
Simulation Regression Statistical Inferences Classification Clustering Artificial Intelligence Game Theory
Simulation methods include…
(predictive analysis)
Discrete event simulation
Monte Carol
Agent-based modelling
Regression methods include…
(predictive analysis)
Logistic
Linear
Step-wise
Statistical Inferences include…
(predictive analysis) Confidence intervals Hypothesis testing Analysis of variance Design of experiments
Descriptive analysis is used to…
answer the question“What happened?”
They describe the problem situation for further analysis. It can be based on descriptive statistics that are conveyed through:
(1) charts and graphs such as histograms, scatter plots, etc., and/or
(2) numerical presentations such mean, median, mode, variance, standard deviations of distributions of data, and cross tabulations.
Primary factors that an analyst generally considers when selecting an appropriate methodology (7 things)
- Time to complete project
- Accuracy of the model needed
- Relevance of the methodology and scope of the project
- Accuracy of the data
- Data availability and readiness
- Staff and resource availability
- Methodology popularity (go with the best approach not the most popular)
Discrete event simulation (what is it and why use it)
A simulation methodology that is often used to understand bottlenecks in systems.
Handles cases that cannot be handled by queuing theory.
Often used for multistage processes modeling with variations in their arrivals and service time and utilizing shared resources to perform multiple operations.
Queuing model (what is it and why use it)
Designed to identify the most efficient pathway to
solution; i.e., at a bank it might identify the number of tellers needed to satisfy customers in a particular time frame such as no more than 10 minutes waiting in a queue.
Monte Carlo Simulation (what is it and why use it)
Used when a queuing model isn’t needed.
Used primarily to estimate dependent variable randomness out of a set of independent variable randomness. This is especially necessary when distributions of the input variables are not necessarily normally distributed and the relationship to estimate the dependent variable is not simple (e.g. additive)
Agent-based modeling (what is it and why use it)
A system modeled (simulated) as a collection of autonomous decision-making entities called agents that are used to discover emergent behavior that is hard to predict without simulating it.
System dynamics (what is it and why use it)
A simulation approach used to understanding the interactions of a complex system over time.
Game theory (what is it and why use it)
Study of strategic decision-making processes through competition and collaboration
Probability theory (what is it and why use it)
The likelihood of a particular event occurring expressed as a percentage to make decisions under chosen risk or tolerance. Bayesian and conditional probabilities are widely used in analytics.
Economic analysis (what is it and why use it)
Evaluation often used to guide the optimal allocation
of scarce resources. Can include: IRR, NPV, FV, Payback period
Regression analysis (what is it and why use it)
A class of statistical methodologies used to map dependent variables with independent variables and understand the significance between the variables and their correlations with one another.
Linear regression (what is it and why use it)
Compares the relationship between a dependent
variable and one or more explanatory variables. The variables here are linear; however nonlinear functions can be explored here by scaling input data.
Stepwise regression (what is it and why use it)
method of model building that successively adds or deletes variables based on performance
Logistic regression (what is it and why use it)
may also be called logit analysis, is a regression analysis often used to predict the outcome of categorical variables.
What are some key artificial intelligence models
Artificial Neural Networks, Fuzzy Logic, Expert Systems
Value-stream mapping
Is a lean-management method for analyzing the current state and designing a future state for the series of events that take a product or service from its beginning through to the customer.
Requires more aggregate data compared to a discrete-event simulation model.
Pros and Cons of aggregating data at a lower level
The lower the level of aggregation, the more accurate and descriptive the model will be of the real-life scenario;
however, it will be harder to validate and will certainly be more prone to mistakes.
Pros and Cons of aggregating data at a higher level
usually provides faster results that are easier to understand but with less accuracy.
How much should you aggregate data?
The general rule of thumb is to model at the highest level of aggregation possible that will ensure a satisfactory level of accuracy within the time permitted.
What are some types of software tools?
Spreadsheets optimization systems statistical software simulation systems business intelligence systems data management systems, data integration systems, Big Data operating systems (like HADOOP)
What are the three portions that data should be divided into for model testing
Building data (training data) - used to estimate the need parameters Testing data - used to test (verify) the model's ability to provide accurate results Validating data - used to test that the model behaves closely to the physical behaviour being modeled.