P1.F.4.3 Data Analytics - Analytic Tools Flashcards
Type of Data Analytics
P1.F.4.3 Data Analytics - Analytic Tools
- Descriptive: Who, what, when and where?
- Diagnostic: Why something happened?
- Predictive: What will happen? (forecast)
- Prescriptive: What should happen? Greatest value because its exercises can lead to decisions that can create value.
Predictive Analytic Techniques
P1.F.4.3 Data Analytics - Analytic Tools
- Find exploratory variables that correlate to dependent variable.
Example: calculate regression equations - Decide which data to include or exclude
Example: outlier: outside the norm. - Derive regression line supported by backtesting
- Validating the fit: split into two groups; one to derive and one to test
- Compare with other models
Exploratory Data Analysis
P1.F.4.3 Data Analytics - Analytic Tools
- An exercise undertaken without an existing hypothesis regarding the data.
- Goal is to find a new and useful relationship among variables
Limitations of Data Analytics
P1.F.4.3 Data Analytics - Analytic Tools
- Doesn’t explain causation or address motives
- Lacks qualitative measures
- May encourage transactional focus instead of relationships
- Doesn’t lead to perfect decisions
- Confirmation bias must be overcome
Data Analytic Model Challenges
P1.F.4.3 Data Analytics - Analytic Tools
- Will never reconcile exactly
- Employing the right level of detail
- Increasing variables increase costs and complexity
- Randomness always seems present
- Choosing and sampling population
Data Analytic Model Types
P1.F.4.3 Data Analytics - Analytic Tools
- Clustering: define variables and visually displays them
- Classification: puts observations into categories
- Regression: study of relationships among variables
- Multiple regression: more than one explanatory variable
Sensitivity Analysis
P1.F.4.3 Data Analytics - Analytic Tools
- Refers to the degree to which changes in input variables affect output.
- Shows which variable are critical and how to measure them.
- Demonstrates overall quality and data sufficiency
- Models should be built to accommodate
- End results demonstrates model trustworthiness
Sensitivity Analysis Benefits & Limitations
P1.F.4.3 Data Analytics - Analytic Tools
Benefits
- Demonstrates model veracity
- Spotlights important variables to control
Limitations
- Only shows what to discard
- Overhead cost that doesn’t add to the value chain
Simulation Models
P1.F.4.3 Data Analytics - Analytic Tools
- Systematic way of dealing with uncertainty
- Repeatedly test model with randomized inputs
- Demonstrates range and probability of outputs
- Vast applications
Simulation Model Benefits & Limitations
P1.F.4.3 Data Analytics - Analytic Tools
Benefits
- Makes decisions in the face of uncertainty.
- Helpful in replacing intuition, prejudice and flat out guessing
- Creates confidence around best-case, worst-case and most likely scenarios
Limitations
- Can’t predict human responses or behaviors to changes
- Can’t model casual links that affect a particular result in the real world
- Accuracy depends on input quality
What-if & Goal Seeker
P1.F.4.3 Data Analytics - Analytic Tools
- Both tools to run scenarios to understand possibilities
- Prepare for best/worst case
What-if
- Starts with changes in input
- What will happen if we change this?
Goal-seeking
- Starts with output goal
- If we want to change the result, what needs to happen?
Regression - Simple & Multiple
P1.F.4.3 Data Analytics - Analytic Tools
- Find dependent variable
- From one or more independent (explanatory) variables
- Contains constants
Simple: one explanatory
Multiple: more than one explanatory
Least Squares Line
P1.F.4.3 Data Analytics - Analytic Tools
The line that minimizes the vertical distances between itself and the data points.
Least Squares Line Equation
P1.F.4.3 Data Analytics - Analytic Tools
Observed value = Fitted value + Residual
- Fitted value: vertical line distance between x-axis and the line
- Observed value: actual point
- Residual: difference between fitted value and observed value
Regression Equation Calculations
P1.F.4.3 Data Analytics - Analytic Tools
y = a + bx
y = the mean y value a = the optimal y-intercept b = the optimal slope (variable coefficient) x = the mean x value
- b (numerator) = (mean x value - x value) - (mean y value - y value)
- multiply x difference by y difference
- b (denominator) = (mean x value - x value) squared
- add the values
Correlation Coefficient: r
P1.F.4.3 Data Analytics - Analytic Tools
- Correlation between two variables
- Range from -1 to 1
- -1 implies perfect inverse correlation. As one variable increases, the other decreases by the same amount.
- 0 implies no correlation
- 1 implies perfect correlation
Coefficient of Determination: R2
P1.F.4.3 Data Analytics - Analytic Tools
- Fit between least squares line and observed data
- % of variance in independent variable explained by least squares line
- Range from 0% (no explanation) to 100% (perfect explanation)
- Can’t compare outside context
- Can be applied to simple and multiple regression
Time Series Analysis
P1.F.4.3 Data Analytics - Analytic Tools
- Values of same variables over time
- Used in forecasting
- Trend: generally, are things increasing or decreasing over time, and if so, by how much?
- Cyclical: how things change over long-term cycles (more than one year)
- Seasonal: how things change over a one year cycle
- Irregular: seemingly random and unpredictable, does not repeat in any particular patterns
Time Series Analysis Benefits & Limitations
P1.F.4.3 Data Analytics - Analytic Tools
Benefits
- Assist with understanding decisions
- Applications almost limitless
Limitations
- Only shows correlation: don’t help in identifying root cause
- Echo chamber effect: more useful within range than outside
- Reliance on lagging indicators (historical observations)
- Random noise can distort picture