Principles of Statistics Flashcards
What does analysing data with statistics do?
- Framework to uncover hidden patterns 🏗️
- Objective Perspective 🎯
- Test Hypotheses🧪
- Confident Decisions: Rely on Data > Assumptions e.g. lead time changes💪
How have you applied statistical testing when analysing data?
- Descriptive stats: mean, median etc.
- Inferential stats: Hypothesis Testing: pearson’s correlation coefficient or Regression
- Assess Model: RMSE, MAE
What is Hypothesis Testing?
- Inferential stats method 📈
- Assess a hypothesis about a larger population based on a sample 👥 🎛️
- 2 Competing hypothesises - null (no sig correlation) and alternative (a sig correlation)❌🔀
- See if observed data is due to chance🔭🍀
What is Inferential Statistics?
- Field of Statistics🌾
- Analytical tools to draw conclusions about a whole population 🌍 based on a sample 🔬
What is Pearson’s correlation test?
Type of hypothesis testing that determines if a relationship exists between 2 variables (lead time and stock holding)
What is a t test?
Hypothesis test that compares the means of 2 groups
What was the significance level that the P value was tested against?
5% significance level (p < 0.05)
What is a p-value?
- Statistical Measure 📏
- DETERMINES if the results are statisically significant⭐⭐⭐⭐⭐⭐⭐
- A low p value < 5% = reject the null hypothesis and conclude the alternative that there is an effect/relationship/difference
- A high p value > 5% = conclude the null hypothesis and that there is no effect/relationship/difference between 2 variables
Interpret the P value results of the Pearson’s Correlation Test
- P Value < 0.05
- Reject Null
- Conclude Alternative
- WAS a significant relationship between Lead Time & Stock Holding
Interpret the correlation coefficient of the pearson’s correlation test
- Strength of relationship
- -1 to 1
- Positive Value, far from 1
- Weak Positive relationship
- Could infer from the sample: a relationship did exist between lead time and stock holding in the Frozen Warehouse (Inferential Stats example)
Have you encountered a situation where stats method did not yield the desired results? How did you rectify it?
- Regression = high error & poor fit
- Due to small sample size, DQ issues or weak relationship
- Frozen Suppliers not adhere to lead times
- Summer build stock (irrespective of lead time)
- Customer demand, supplier shortages, warehouse space (not considered by model)
- External factors: historical data may be better
- Time series: identify patterns
What is linear regression?
- Stats method
- Predicts an outcome based on another
- By fitting a line of best fit to the data
- The equation of the line allows the model to make predictions
- E.g. if the lead time was 30 days (x axis), you could see where the line intercepts the x axis and see the corresponding y value (stocking holding) as the prediction
When did you use linear regression?
- To predict stock holding from lead time
- Lead Time as the independent variable (x axis)
- Stock Holding as the dependent variable (y axis)
What was the independent variable in your regression model?
Lead time on the x axis
What was the dependent variable on your regression model?
Stock holding on the y axis