Predictive analysis slides Flashcards
What are the branches listed that makeup predictive analysis?
- Forecasting
- Regression
- Classification
- Text Analysis
- Decision Trees
- Machine Learning
Probability distribution
The area under the curve is exactly equal to one. So the most we can get from a probability is one. The shape doesnt’ matter. The area under the curve is the probability that an event will occur.
What is the normal distribution? (Histogram)
This is a strong distribution because it is infinite since this never comes down to the axis. You are working on infinity on both positive and negative sides.
It is symetric about the mean. You can calc a standard deviation.
What is a standard deviation in a normal distribution? (Histogram)
The standard deviation of + or - 1 has 68.3% of the area. So the value will occur between + or - 1
The standard deviation is used when the distribution resembles a bell curve.
Used to see if a value is statistically significant or a part of expected variation
Remember 68 - 95 - 99.7 rule:
68 is to 1sigma
95 is to 2 sigma
99.7 is to 3 sigma
Anything outside the distibution is 0.15% either + or -
How can you interpret the standard deviation of 22 when your average is 36 example?
With a mean of 36 and a standard deviation of 22, most of the data points fall likely within the range of 14 to 58 (mean +- 1 SD). This range covers approximately 68% of your data if it follows a normal distribution
How to approach a Monte Carlo
Use a model to relate input variables to an output
Definition: model used to predict probability of a variety of outcomes when the potential for random variables is present. They help explain risk and uncertainty in prediction and forecasting models
What is the equation for net income?
revenue - cost of goods sold (COGS) - operating expenses - interest - taxes
What is the formula for spending money?
amount earned - food - housing - education - car
What is the equation for happiness?
health + wealth + freedom + relationships
What is our definition for a Monte Carlo Simulation? Defined by my professor
- Repeat trials where values for input variables are chosen from the set of possibilities given by the user (i.e. quantify uncertainty)
- Output is calculated and plotted (i.e. to assess risk)
Then perform the process over again and again like a roulette game
Decision maker can see the entire range of all possible outcomes
What is an equation for profit?
revenue - expenses
What is an equation for revenue?
items sold * cost per item
Summary of monte carlo simulation
- Is repeated trials
- Based on probability of an event occuring
- Requires a ‘model’ relating inputs to an output —> Value of an input chosen based on its probability distribution (quantify uncertainty). Probability distribution of the output is built by repeated trials with inputs (assess risk)
- Decision maker has to determine their risk tolerance
- Advantage is to see all possible outcomes with their risk rather than a single average outcome