01. Flow Basic Flow Metrics Basics Flashcards
What are the different types of Averages?
- Mean
- Median
- Mode
What is an average
- It is the values that are most representative/typical of the dataset.
How do you calculate the mean
Add all the numbers together, and then divide by how many numbers there are.
What are outliers
An extremely high or low values that stands out from the rest of the dataset
How do outliers change the representation of the dataset
- They skew the representation depending on the outliers.
- Outliers “pull” the data to the left or right
How do outliers affect the mean?
Outliers pull the mean higher or lower, skewing the data to the left or right.
Mean won’t give you the best representation of what a typical value is
What does it mean if the data is skewed to the right
Data that is skewed to the right has a “tail” of high outliers that trail off to the right.
What does it mean if the data is skewed to the left?
Data that is skewed to the left has a “tail” of high outliers that trail off to the left.
What does it mean if the data is Symmetric
If the data is symmetric, the Mean, Median and Mode are in the middle.
No outliers pull the mean in either direction, and the data has the same shape on either side of the centre.
Which averages are not affected as much by outliers?
- Median
- Mode
How is the Median found?
- Line up all the values in ascending order.
- If there are an odd number of values, the
median is the one in the middle. - If there are an even number of values,
add the two middle ones together and
divide by two.
Where is the Mean and Median in a right-skewed dataset
If the data is skewed to the right, the mean is to the right of the median (higher).
What is the Mode?
- The mode of a set of data is the most popular value, the value with the highest frequency
- The mode has to be in the data set.
- It’s the only average that works with categorical data.
Where is the Mean and Median in a Left-skewed dataset
If the data is skewed to the left, the mean is to the left of the median (lower).
Can a dataset have more than one Mode?
- If there is more than one value with the highest frequency, then each one of these values is a mode.
- If the data appears to represent more than one trend or set of data, we can provide a mode for each set.
- This would be a bimodal dataset.
What are the Three steps for finding the mode?
- Find all the distinct categories or values in your data set.
- Write down the frequency of each value or category.
- Pick the one(s) with the highest frequency to get the mode.
Flow metrics
What metrics can be used when analysing flow
- Range
- Quartiles
- Percentiles
- MAD - Median of the absolute
Define Range.
- We can use the range to measure Variability / Spread?
- The range lets us know how the data varies
- The more variability, the less predictable the source of the data is
What is the Use of range in Flow Analysis?
Shows full Spread / Width of flow times.
How do we measure the range?
- The range measures the spread/width of a dataset.
- It’s given by Upper bound - Lower bound.
- Where the upper bound is the highest value, and the lower bound is the lowest value.
What are the Advantages of Range
- Simple to calculate
- Highlights extreme variation
What are the disadvantages of Range
- Sensitive to outliers
- It doesn’t tell whether the values are clustered or spread out. It ignores whether values are bunched near the median or scattered all over.
- It can be misleading if you assume it reflects “overall variation.”
- It only uses two data points—the smallest and largest—and doesn’t care what happens in between.
- It can give a sense of less predictability because of the wide range
When is the Range used?
When you want a quick view of a variation.
How can we negate the impact of outliers when analysing dispersion?
Use the range within the dataset that does not include the outliers
Quartiles
Define Quartiles
Quartiles specifically refer to the values that split the data into quarters.
The lowest quartile is the lower or first quartile (Q1).
The highest quartile is known as the upper quartile or third quartile (Q3).
The quartile in the middle (Q2) is the median, as it splits the data in half.
How are Quartiles created?
- First, line up the values in ascending order and then split the data into four equally sized chunks, each containing one-quarter of the data.
How is the Interquartile calculated?
Check pic
What is the Interquartile range?
Interquartile range = Upper quartile – Lower quartile
The interquartile range is much less sensitive to outliers.
It’s another way in which we can compare different sets of data.
Can Quartiles show variability in your data?
Quartiles also show you how spread out your data is:
The Interquartile Range (IQR) = Q3 - Q1 tells you the range of the middle 50% of your data.
This is a very robust measure of variation — it ignores outliers.
Quartiles can help you see what in Flow Analysis
Quartiles include the median (Q2)
- What’s in the middle of your data
- A more robust centre than the mean if your data is skewed
Quartiles help you see the centre — especially in skewed distributions where the mean is misleading.
When using Quartiles, what are the signs of variability
- If Q1 and Q3 (Interquartile Range (IQR)) are close together → low spread (consistent process)
- If they’re far apart → high spread (variable process)
How are the quartiles used in Flow Analysis?
Quartiles help understand central tendency and spread.
- Q2 (median) gives a strong view of the middle/typical value.
- Q1 and Q3 show how tightly or loosely the values are packed around the centre.
Together, they give a powerful, outlier-resistant summary of your data.
What are the Advantages of quartiles
- Resistant to outliers
- Good for skewed data
What are the disadvantages of quartiles?
IQR (from quartiles) gives you a summary of the middle, but it:
Doesn’t measure how far values deviate from the median (like MAD or SD)
Ignores the edges of your data, where rare but critical events happen
What question do quartiles not address
- How far do values deviate from the centre - MAD or standard deviation
- How bad the worst-case flow delays can get - Percentiles (90th, 95th, max)
- Are there rare but extreme cases - Look at the tails explicitly or use box plots with outliers
What are Percentiles
A percentile indicates that:
“X% of the data points are less than or equal to this value.” Thus:
- The 95th percentile represents the threshold at which 95% of the dataset is below.
- The top five percent surpass that value, typically representing delays, exceptions, or outliers.
- Percentiles give you a target you can manage and evidence to show stakeholders.
What are Percentiles used for?
- Setting realistic service level targets (SLA).
- Setting expectations you can communicate to customers or stakeholders.
- Setting performance goals your team can aim for.
- Giving visibility into the worst-case (or best-case) scenarios — especially the ends or “tails” of the distribution
What are the different methods for calculating percentiles
- Linear Interpolation Between Ranks
- Nearest Rank Method
What is the difference between Linear Interpolation & Nearest Rank percentiles?
- Interpolation is more statistically smooth and useful in continuous data analysis.
- Nearest-rank is simpler and easy to explain manually,
How would you calculate the Nearest Rank Method for Percentiles?
See Pic
If you have 125 numbers and want to find the 10th
percentile.
- start by calculating 10 × 125 ÷ 100. This gives you a value of 12.5.
- Rounding this number up gives you 13, which means that the 10th percentile is the number at position 13.
Flow example is 95% of items done in under 10 days
see pic
What are the advantages of percentiles?
- Great for setting expectations
- Can capture tail behaviour
Why are percentiles great for setting expectations?
Percentiles are great tools for clearly stating what results to expect, even when the data is noisy, messy, or not normally distributed.
What does it mean when we say that a Service Level Agreement (SLA) specifies a 90th percentile cycle time of 10 days?
90% of requests are resolved in 10 days or less.
Why do percentiles matter
- Percentiles reflect actual performance for most people
- You can base contracts, dashboards, or SLOs on them
- You don’t get tricked by a few extreme cases (like averages do)
What is MAD?
- MAD = Median of the Absolute Deviations from the Median
- Indicates How tightly the values cluster around the median.
- Good for: Processes where you monitor consistency or “typical” deviations from the norm.
- One number gives you a sense of core variation, regardless of shape.
MAD = median(lead_time_i - median(lead_times))
How to Use MAD in Practice
- Calculate the Median of your flow metric (lead time, cycle time, etc.)
- Compute the absolute deviations from the median
- Find the median of those absolute deviations → That’s your MAD
Why Use MAD in Flow Analysis?
- Robust: Not affected by outliers (unlike standard deviation)
- Great for skewed or non-normal data (typical in lead times, cycle times)
- Highlights consistency in process performance
- One number gives you a sense of core variation, regardless of shape.
Find the MAD for the following cycle time.
Cycle times (days): [5, 6, 7, 8, 9, 12, 100]
Cycle times (days): [5, 6, 7, 8, 9, 12, 100]
- Median = 8
- Absolute deviations = [3, 2, 1, 0, 1, 4, 92]
- MAD = Median of deviations = 2
MAD = 2 tells us that most values deviate from the median by just 2 days,
Can Mad be used with control charts
See Pic