Principles of Statistics Flashcards

1
Q

What does analysing data with statistics do?

A
  • Framework to uncover hidden patterns 🏗️
  • Objective Perspective 🎯
  • Test Hypotheses🧪
  • Confident Decisions: Rely on Data > Assumptions e.g. lead time changes💪
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How have you applied statistical testing when analysing data?

A
  1. Descriptive stats: mean, median etc.
  2. Inferential stats: Hypothesis Testing: pearson’s correlation coefficient or Regression
  3. Assess Model: RMSE, MAE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Hypothesis Testing?

A
  • Inferential stats method 📈
  • Assess a hypothesis about a larger population based on a sample 👥 🎛️
  • 2 Competing hypothesises - null (no sig correlation) and alternative (a sig correlation)❌🔀
  • See if observed data is due to chance🔭🍀
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Inferential Statistics?

A
  • Field of Statistics🌾
  • Analytical tools to draw conclusions about a whole population 🌍 based on a sample 🔬
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Pearson’s correlation test?

A

Type of hypothesis testing that determines if a relationship exists between 2 variables (lead time and stock holding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a t test?

A

Hypothesis test that compares the means of 2 groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What was the significance level that the P value was tested against?

A

5% significance level (p < 0.05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a p-value?

A
  • Statistical Measure 📏
  • DETERMINES if the results are statisically significant⭐⭐⭐⭐⭐⭐⭐
  • A low p value < 5% = reject the null hypothesis and conclude the alternative that there is an effect/relationship/difference
  • A high p value > 5% = conclude the null hypothesis and that there is no effect/relationship/difference between 2 variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpret the P value results of the Pearson’s Correlation Test

A
  • P Value < 0.05
  • Reject Null
  • Conclude Alternative
  • WAS a significant relationship between Lead Time & Stock Holding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpret the correlation coefficient of the pearson’s correlation test

A
  • Strength of relationship
  • -1 to 1
  • Positive Value, far from 1
  • Weak Positive relationship
  • Could infer from the sample: a relationship did exist between lead time and stock holding in the Frozen Warehouse (Inferential Stats example)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Have you encountered a situation where stats method did not yield the desired results? How did you rectify it?

A
  • Regression = high error & poor fit
  • Due to small sample size, DQ issues or weak relationship
  • Frozen Suppliers not adhere to lead times
  • Summer build stock (irrespective of lead time)
  • Customer demand, supplier shortages, warehouse space (not considered by model)
  • External factors: historical data may be better
  • Time series: identify patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is linear regression?

A
  • Stats method
  • Predicts an outcome based on another
  • By fitting a line of best fit to the data
  • The equation of the line allows the model to make predictions
  • E.g. if the lead time was 30 days (x axis), you could see where the line intercepts the x axis and see the corresponding y value (stocking holding) as the prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When did you use linear regression?

A
  • To predict stock holding from lead time
  • Lead Time as the independent variable (x axis)
  • Stock Holding as the dependent variable (y axis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What was the independent variable in your regression model?

A

Lead time on the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What was the dependent variable on your regression model?

A

Stock holding on the y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What evaluation metrics did you use to determine the accuracy and effectiveness of your models?

A
  • ROOT MEAN SQUARED ERROR- measures the difference between actual and predicted values (lower value is better)
  • MEAN ABSOLUTE ERROR- showed how much error was in the predictions too (lower value is better)
  • R SQUARED - most common - shows how much data variation is explained by the model. 0 - 1. 1 = 100% of the variation is explained by the model. 1 = better fit💯✅
  • Plotted predicted stock/lead time - not a straight 45 line, not performing well
17
Q

What is R SQUARED and interpret your results

A
  • Number that shows how well the line (LR Model) fits the data🔢
  • Tells me how much of a difference in stock holding can be explained by lead time⏱️
  • My R-squared was no bigger than 0.05, which means only 5% of the differences in stock holding can be explained by lead time
  • Additionally, Training and Test numbers were lower, which could suggested the model was** too simple to capture the patterns** in the data (underfitting)🤺🧪⚪️
18
Q

What does over fitting mean?

A
  • Model is too complex
  • Fits the training data too well
  • Cannot handle data that is different from that
19
Q

What is a limitation of R squared?

A

sensitive to outliers and my data had a few that could have influenced the score

20
Q

Why did you choose those error metrics?

A

MAE and RMSE as together as RMSE is sensitive to outliers and using both can show more insights. E.g. RMSE bigger than MAE = outliers exist that could throw model off

21
Q

What is a time series forecast?

A

Type of predictive analysis that predicts future values based on historical data collected at specific intervals. It analyses past trends, patterns and seasonal variations to make these predictions.

22
Q

What tool did you use for the time series forecast model and why?

A

Python

  1. flexibility: exponential smoothing levels
  2. experiment with different models
23
Q

How do you know if your forecast is accurate?

A

Root Mean Sq Error - margin of error between actual and predicted values

Mean Absolute Error - also measures error between actual/predictive values

Also use confidence levels in the chart to see how confident the model is

24
Q

What are the 4 plots on the decomposition plot show?

A

Observed - actual data
trend - the long term upward or downward direction of the data
seasonality - repeating patterns within specific time periods
Residual (noise): random fluctuations that cannot be explained by trend or seaonality

25
Q

What does decomposition plot do?

A

Breaks down the data into underlying components

26
Q

What does my decomposition plot show specifically?

A

Observed: Peaks and troughs show fluctuation in stock levels over time

Trend: downward trend unto 2020, steep upwards until 2021 - levels off a little

Seasonal: annual pattern - stock levels rising in the second quarter and gradually decreasing throughout the year - working capital management at year-end and ice cream stock building

Residual: random fluctuation do exist that could be due to supplier issues, manual adjustments to orders, changes in space allocated to shelves in shops

27
Q

What time series forecast model did you use to forecast?

A
  • Naive - assumes values would be equal to the most recently observed data value (establishes baseline for comparison)
  • Holt Linear - trend - identifies underlying trend in the data to make future predictions (does not capture autocorrelation which is the relationship between the variables current and past value)
  • Holt Winters - trend and seasonality
28
Q

What parameters did you use to customise the holt winters model?

A
  • Seasonal periods: 52 for weekly
  • Trend/Seasonality: add
  • Smoothing level: how much weight given to past observations when forecasting future values (0.10)
29
Q

What was the outcome of the time series forecast?

A
  • Holt winters 🏆
  • Forecast: higher stock than previous years
  • BUT below 4.3 million unit maximum
  • Winter Stock Build (model not considered)
  • 5 - 7% error, < 5% better but better than “finger in the air”
30
Q

What could your stakeholders use the time series forecast model for?

A

Stock prediction used to:

  • 🎯 Optimise Targets and KPIs
  • ⚠️ Foresee potential issues and correct them
  • ⚡ Maintain efficient warehouse operations
31
Q

Explain the Linear Regression Equation

A

y = mx + c

y = predicted stock holding
m = intercept: where line intersects the y value when x = 0
x = value we change i.e. lead time days
c = gradient that measures the slope of the line

32
Q

What is Descriptive Statistics?

A
  • Summarise & Describe a dataset that’s a representation of a population
  • Overview of characteristics of the data e.g. central tenency (mean, mode, median) or variability (SD, min, max)
  • Understanding of the data - foundation for inferential statistics
33
Q

What are Limitations of Linear Regression?

A
  • Need a linear relationship between varibles (strong relationship = better model)
  • Predictions are limited to models trained range (e.g. 100 days would not produce reliable values as model has not been trained on such values)
  • Predictions are not fact
34
Q

Why did you want to forecast stock holding?

A
  • Action needed before year end W/C metrics?
  • Reduce risk of reach capacity and impacting operations
  • We pay for space, need to use it efficiently, useful to predict stock holding
35
Q

In Time Series Forecasting, why do you Resample the Data?

A
  • Adjust frequency
  • Smooth Gaps
  • Reduce Noise
  • Easier to identify patterns