Application of Statistical Tools and Methods Flashcards
In addition to waste there is another basic form of problem, what is it?
It is variation.
Variation is the deviation of single data points within a process.
What is a Six Sigma Black Belt?
A Six Sigma Black Belt is a certified Six Sigma Expert, similar to a certified Lean Consultant being an expert for Lean Consulting.
Overall there several Six Sigma qualification levels (highest on top):
- Six Sigma Champion
- Six Sigma Master Black Belt
- Six Sigma Black Belt
- Six Sigma Green Belt
Decisions are based on the maximum and minimum values
Why is the collection of data important?
The main target of collecting data is to make “things” measurable.
- Improvements can be shown based on a your data collection.
- The data collection is done at the beginning on a project.
- You need to show facts and figures.
- Facts and figures you need to collect in your diagnosis phase.
- You need to present your data (numbers and key performance indicators, KPI) as well.
Why is data within a project important?
- It is important to be clear about the KPIs that can measure the success of the project
- The data within the project is important for a good and accepted baseline.
- Improvements can be shown/proved based on collecting of data at the beginning of the project
- Also quantitative and not only qualitative results are visible!
- The quantitative results mostly convince.
With collected data you can?
You can verify the process
… confirm your hypotheses as vaild
or
… reject them as invalid
What is a UCL?
Upper control limit
What is a LCL?
Lower control limit
Why should we combine Lean and Six Sigma-Tools?
Lean Six Sigma connects the implementation speed of Lean with the precision and high sustainability of Six Sigma.
“Lean Six Sigma“ combines the speed, the way of observation and the implementation orientation from Lean and the discipline of this systematic and the statistical analysis tools from Six Sigma.
What is “pure” lean?
Risk that complex problems and underestimated wrong measures are implemented
What is “pure” six sigma?
Risk that complex analysis are used for simple answers or that simple problems were left behind
What is Lean Six Sigma?
connects the implementation speed of Lean with the precision and high sustainability of Six Sigma.
What is the main difference between Lean & 6 Sigma?
The 4 Proofs
Proof of the Gage repeatability and reproducibility
Proof of main effects
Proof of the improvement of the main KPI
Proof of sustainability
Fast and sustainable results are ensured through the 4 Proofs
Lean focuses on analyzing workflow to reduce cycle time and eliminate waste. Lean strives to maximize value to the customer while using a few resources as possible. Six Sigma strives for near perfect results that will reduce costs and achieve higher levels of customer satisfaction.
What are the 4 proofs?
Proof of the Gage R&R (gage repeatability and reproducibility) and of the baseline
-I have to ensure that the measurement is correct, then I can see if there is a problem or not
Proof of main effects
-What is the biggest lever, were is the root cause
Proof of the improvement of the main KPI
-Is the implemented action successful?
Proof of sustainability
-Is it possible to keep this good result?
Lean Six Sigma tools are used to find what?
the strongest lever in our projects and to reach the targets with less efforts and to stabilize the project results.
What is the basis to improve processes?
Process-oriented thinking
What are the differences between Lean and Six Sigma?
Lean
• High implementation speed
• Sustainability needs to be improved
Six Sigma
• Implementation after lenghty analyses
• High sustainability
Six Sigma uses the power of what to collect data in repetitive processes?
Statistics
What does the “central limit theorem“ in mathematics state?
If you collect enough data, the distribution of it, tends to be a normal distribution (bell curve)
this theorem is the foundation and the assumption that every process has the behavior of a bell curve
Six Sigma is generally not used in which cases?
Six Sigma is in general not used in non-repetitive processes, since it is hard to collect the adequate amount of data.
Focusing in the process in terms of Six Sigma means what?
looking at the process mathematically
What Is the Definition of “Sigma“?
The standard deviation sigma (symbol “σ“) is a measure for the distribution of measured values.
It describes the distance from the mean μ to the inflection point of the curve.
If σ is small, what does the bell curve look like?
The curve is narrow, which means the process has low variation and is performing good.
If σ is large, what does the bell curve look like?
The curve is wide spread, which means the process has high variation and is performing bad.
Why are there additional signmas?
With the addition of several Sigmas, we are able to describe the distribution of the data of a process.
What Six Sigma target did the founders set?
The founders of Six Sigma set the target, that for an almost perfect process a range of 12 Sigma (± 6 Sigma – where the name comes from) should fit within the requirement limits.
That leaves over only 3.4 errors per million operations (defect rate), which indicates, how difficult it is to achieve that target.
How do you apply Six Sigma?
DMAIC-cycle. That stands for Define, Measure, Analyze, Improve and Control
In the DMAIC cycle, what does DEFINE mean?
What is the problem? Detailing the projects Step 1: Project Charter / potential Step 2 : Customer Interview Step 3: SIPOC (Supplier Input Process Output Customer)
In the DMAIC cycle, what does MEASURE mean?
How can we measure the processes?
Trustworthy measurement
- Gage repeatability and reproducibility
- Process capability analysis
In the DMAIC cycle, what does ANALYZE mean?
What can we deduce from the measurement? Determine the significant factors Step 7: Determine the potential X‘s Step 8: Analysis of the X Step 9: Proof of relationship Step 10: Functional relation Y = f (Xi)
In the DMAIC cycle, what does IMPROVE mean?
How can we improve the processes? Improve the main parameters Step 11: Optimal setting of the important X Step 12: Tolerancing X Step 13: Process Capability Y and X
In the DMAIC cycle, what does CONTROL mean?
How can we monitor the process result?
Ensuring the improvements
Step 14: Control Plan
Step 15: Project conclusion
What are some of the main ideas to keep in mind about Six Sigma?
- I can´t improve, what I don´t measure
- Each process has variation
- Mean and variation are needed to make a decision
- Make customer requests measurable
- Find out the main effect
- Effective actions are the best actions (PDCA)
If you collect an infinite amount of data, what will the distribution look like?
- like a normal distribution curve
* like a bell curve
If you achieve a world-class process according to Six Sigma understanding, what’s the defect rate then?
3.4
What different kinds of data types are there?
Continuous Binary Quantity Nominal Ordinal
Continuous data is?
Measurable on the basis of a scale, e.g. Diameter 1.25 mm
Continuous data is data that can take any value.
Height, weight, temperature and length are all examples of continuous data.
Some continuous data will change over time; the weight of a baby in its first year or the temperature in a room throughout the day.
Binary data is?
Ok/Not Okay
Binary data is data whose unit can take on only two possible states, traditionally labeled as 0 and 1 in accordance with the binary numeral system and Boolean algebra
Ordinary data is?
School grades
Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories is not known. The ordinal scale is distinguished from the nominal scale by having a ranking.
Quantity data is?
number of errors, number of scratches
Quantitative data is defined as the value of data in the form of counts or numbers where each data-set has an unique numerical value associated with it
Quantitative data is data expressing a certain quantity, amount or range
Nominal (“named”) data is?
Colors
Type of data that is used to label variables without providing any quantitative value
It is the simplest form of a scale of measure
One of the most notable features of ordinal data is that, nominal data cannot be ordered and cannot be measured.
What different kinds of distribution are there?
Normal distribution Poisson distribution Binomial distribution Weibull distribution Exponential distribution
What is normal distribution?
Normal distribution
• Continuous data with a symmetric distribution
• Characteristic “bell shape”
• Our standard distribution
Normal distribution describes continuous data which have a symmetric distribution, with a characteristic ‘bell’ shape.
What is poisson distribution?
Poisson distribution– DISCRETE
Describes the number of events occurring in a fixed time interval or region of opportunity
• Number of binary data from an infinite sample
• Distribution of rare independent events (e.g. errors)
• Average occurrence rate is known
(e.g. Roulette 1/37 per number = 2.70 %)
Poisson distribution describes the distribution of binary data from an infinite sample. Thus it gives the probability of getting r events in a population.
The Poisson distribution is used to describe discrete quantitative data such as counts in which the population size n is large, the probability of an individual event is small, but the expected number of events, n, is moderate (say five or more). Typical examples are the number of deaths in a town from a particular disease per day, or the number of admissions to a particular hospital.
What is binomial distribution?
Binomial distribution
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times
(pass/fail; yes/no)
• Number of binary data from a finite (limited) sample
• Number of defective units (result OK/NOK)
• Can be estimated with the Poisson distribution with a high number
of runs
What is weibull distribution?
The Weibull distribution is a family of distributions that can take on many shapes, depending on what parameters you choose.
It’s commonly used to assess product reliability, analyze life data and model failure times.
• Time to technical failures
• Reliability / service
“Bathtub Curve
What is exponential distribution?
Exponential distribution
Probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.
It is often used to model the time elapsed between events
- Grows or falls very quick and then approaches very slow (endless)
- Service life of electrical components
- Radioactive decay
What are three things to keep in mind about data?
- Data can be discrete or continuous
- Discrete data is subdivided in binary, quantity and category
- Different distributions show different data
Discrete data is information that can only take certain values
Continuous data is data that can take any value
What is a sample?
A sample is a subset of a population.
With the help of this sample we try to draw meaningful conclusions with regard to the population.
What is the target of a sample?
Target: Representative and meaningful quantity over the distribution of the basic population and its representatives.
What are the different types of samples?
Stratified Sample Selection
The selection is proportionate to the subgroups
Simple Sample Selection
Random
Systematic Sample Selection
Sample of every xth part
Sample subgroups
For non-normal distributions
What is the rule of thumb for minimum sample size?
Average value 30
Frequency distribution 50
Pareto distribution 50
Scatter diagram/plot 30
Control charts 30
Proportions 200
What are some things to keep in mind about samples?
A sample is a subset of a population
With the help of this sample we try to draw meaningful conclusions with regard to the population
Objective of a sample selection is to define a representative and meaningful quantity over the distribution of the basic population and its representatives
A rule of thumb gives you the minimum sample size according to the type of representation
Measurement System Analysis (MSA) is?
A measurement system analysis (MSA) is a thorough assessment of a measurement process to identify the variation in that measurement process.
What is the target of MSA?
The target of an MSA is to ensure a high measurement quality in order to avoid misinterpretations
Identify the share of error generated by the measurement
What are the different types of measuring errors?
"Bias" or "offset" Drift “Stability" or “spam" Linearity Repeatability Reproducibility
When should we perform a MSA?
When you need to eliminate variation in the measurement system; eliminate measurement system error
Whenever a measurement is being used to assess the quality or quantity of a product, a measurement system study is required.
For Y’s: always
For X’s: for significant effects
The more precise we can measure Y, the easier it will be to recognize the effects of each X.
The MSA for discrete data checks which of the following aspects?
Repeatability (measuring system spread between a single operator)
Reproducibility (measuring system spread between different operators)
Deviation from the standard (standard part = real value)
The Measurement System is considered suitable if there are what percentage of agreements?
The Measurement System is considered suitable if there are 80% of agreements (between runs, employees and standard).
What kind of measurements do you use for continuous data?
Non-destructive Measurement– testing and analysis technique used by industry to evaluate the properties of a material, component, structure or system for characteristic differences or welding defects and discontinuities without causing damage to the original part
Measurement of diameter (MSA Crossed)
Destructive test ex. Tearing test–Destructive measurements are processes that completely destroy the system they are measuring, and they are primarily used when detecting light.
Part can not be measured again (MSA nested)
What are 3 procedures to use for continuous data?
- Calibration: deviation from a standard value
Measuring instruments without operator effect with standard
part - Gage R & R (repeatability and reproducibility)
Measuring instruments with operator effect with real parts - Repeatability (without operator effect)
Measuring instruments without operator effect with real parts
What are 3 evaluation criteria for continuous measuring system analysis (MSA)?
- Gage R&R
What is the share of the Measurement system error(repeatability and reproducibility) compared with total variation? - Process over tolerance
How high is the share of the measuring system spread compared with part tolerance? - Number of distinct categories
Between how many categories can the measuring system distinguish? “Resolution” to recognize differences between parts
What are some things to keep in mind for Measurement System Analysis (MSA) for discrete & continuous data?
-Measuring error could be a mean value error (location) or a spread error (variance)
-A lot errors could occur during measurements: “Bias” or “offset”, drift,
“stability” or “spam”, linearity, repeatability and reproducibility
- Reproducibility error means different operators get different measurements
- Repeatability error means the same operator gets different measurements
- Different kind of data require different kind of measurements and MSA
- Perform a MSA to find out the measuring error
What are the best tools to use to figure out process variation?
Histogram and Box Plot are graphical methods to visualize variance within a process in different ways to get a better understanding of what’s happening in the process.
Box plot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles
In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape and spread of continuous sample data.
What is a histogram?
A histogram shows the distribution of continuous data (location & spread). Histograms can be created before or during an analysis to support assumptions and steer the further analysis in the right direction.
What is a box plot?
A box plot is a graphical summary of location and spread of a process.
The box plot shows the median using 4 quartiles
What are some things to keep in mind about process variation?
-A histogram shows the distribution of continuous data, it’s a graphical view of the location and spread of
continuous sample data
-Histograms can be created before or during an analysis to support assumptions and steer the further
analysis in the right direction
- A box plot is a graphical summary of location and spread of a process
What is the difference between location and spread & what should I optimize first?
Location: expected value of the output being measured. For a stable process, this is the value around which the process has stabilized.
Spread: expected amount of variation associated with the output. This tells us the range of possible values that we would expect to see.
What’s the difference between a sample and and a population?
- A population includes all of the elements from a set of data.
- A sample consists one or more observations drawn from the population.
This method is used in the Measure Phase in order to find out, what is the current performance and deviation of a process.
It is used as well in the Improve Phase in order to show the improvement of the process by comparing it with the initial
analysis.
What are some things to keep in mind when using sample and population?
- If the spread is too high, you’ll have an unstable process
- Very tight grouping is worth nothing, if you don’t meet the target
- First work on the spread then work on the location
What do you have to do, when the spread of your process is too wide?
• understand the variation of the process in order to reduce it (smaller sigma)
What do you have to do, when the location of your process is not centered?
• understand the parameters for the location and then move it to the center
What is a Confidence Interval?
assumptions and then conclusions about the population with a relative small amount of data
We can derive assumptions (statistical conclusions) about the population with a relative small amount of data (sample)
A confidence interval tells you in which range (interval) the “true” location and/or spread parameters of the basic population lie with a specific probability
What is the Confidence Interval for Means?
A confidence interval for the mean is a way of estimating the true population mean. Instead of a single number for the mean, a confidence interval gives you a lower estimate and an upper estimate.
As soon as the Confidence Interval for mean overlaps, we can’t prove that there is a statistical difference between the average mean of the populations that the samples are taken from. You will want to achieve this if the processes need to be stable.
As soon as the Confidence Intervals for mean no longer overlap, the difference is significant. You will want to achieve this if you want to change the process.
The confidence interval for means is used in what phase?
IMPROVE
in order to graphically show the improvement of the new process performance compared to the initial process performance
How does hypothesis testing work?
Hypothesis testing in statistics is a way for you to test the results of a survey or experiment to see if you have meaningful results.
Hypothesis testing ensures the statistical differences between 2 or more process outputs.
- statistically determine if our improvements are significant
- verify suspected causes
What are the Procedures for Hypothesis Testing?
1️️. What do I want to find out?
• Lead time too long? Difference in paint thickness?
• Improvement is effective? Process is stable?
- Select the correct statistical test
- Set up the hypotheses and significance level
• Formulate what is to be proved as Ha (Alternative hypothesis) and Ho (null hypothesis)
4️️. Take sample and read the p-value
Collect data, run test and read the p-value.
The P value or calculated probability is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true
How do you interpret the result?
• Reject Ha assumption: if p > 0.05
• Accept Ha assumption: if p ≤ 0.05
By using hypothesis testing, what are the two main goals?
Either we can test for deviation from a mean/median, that means location.
Or we can test for the standard deviation, that means spread.
The idea of hypothesis testing is that every statistical test involves the formulation of which two complementary statements?
- Null hypothesis (H 0): there is no difference (status quo)
* Alternative hypothesis (H a): there is a difference (change has occurred)
What does a Hypothesis Testing Decision Tree help you do?
The decision tree helps you - depending of the kind of data - to choose easily the correct and required statistical test.
What does hypothesis do?
- ensures the statistical differences between 2 or more process outputs
- statistically determines if our improvements are significant
- verifies suspected causes
What does the null hypothesis assume?
The null hypothesis (H 0) assumes there is no difference (status quo)
What does the alternative hypothesis assume?
The alternative hypothesis (H a) assumes there is a difference (change has occurred)
What are the two types of wrong decisions possible during hypothesis testing?
Two types of wrong decisions are possible during hypothesis testing, those are called α risk and β risk
What are some things to keep in mind about hypothesis testing?
- α risk– rejecting the null hypothesis even thought it is true, the measure/action is successful, but in reality it is not
- In case of a β risk, the decision is that “x” is not the cause for this problem, although it actually is
- p-value is a probability value
In statistics a relation between different variables is called what?
a regression
What is graphical regression?
Tools for recognizing/representing (obvious) connections
Suitable for steering committees
What is statistical regression?
Tools for verifying/proving connections
What are different regressions?
Discreet
Continuous
What is the best way to visualize data?
Scatter plot
What is a matrix plot?
A Combination of Scatter Plots
The matrix plot is a matrix of scatter plots with each X and the Y.
The objective of the matrix plot is to look at the influence of each X and the Y and to also recognize any relations between X to X.
What are some things to keep in mind about visualizing data?
- Correlation is the relationship between an independent variable and a dependent variable
- The goal is to prove that there is a linear relationship/correlation between 2 (or more) variables/factors/parameters
- The KPI showing this is called “correlation coefficient”
What happens during the IMPROVE phase of DMAIC?
After we have understood in the Analysis Phase what the root causes for variations are, we have now to derive improvement actions and then implement them.
In the improve phase we use tools that you already know from the measurement, analyze and control phase in order to develop and realize the improvement.
- Spread and location
- Confidence interval
- Hypothesis testing
- Control charts
A process is stable and predictable if the variation is?
natural
We can only improve processes when they are stable and predictable. If not: we need to stabilize the process.
A process is stable when?
A process is stable if all the data points are within two control limits.
To monitor this, a control chart is used for visualization.
A control chart or process-behavior charts, are a statistical process control tool used to determine if a manufacturing or business process is in a state of control
How do you monitor process stabilization?
A control chart is an extension of the run chart and includes specification and control limits.
A run chart, also known as a run-sequence plot is a graph that displays observed data in a time sequence.
A run chart is a line graph of data plotted over time. By collecting and charting data over time, you can find trends or patterns in the process. Because they do not use control limits, run charts cannot tell you if a process is stable. However, they can show you how the process is running.
What is natural variation?
“natural”: within the control limits
The natural variation or “common cause” variation is the natural fluctuations in process flow introduced by individuals, slight differences in execution, or instrument performance fluctuations.
What is exceptional variation?
“exceptional”: outside the control limits
Exceptional variation, also called special cause or assignable cause variation, does not follow a predictable pattern. Exceptional variation is a signal that the process is changing over time.
What are some things to keep in mind about a control chart?
-UCL and LCL are set by the process
-USL and LSL are set by the customer requirements therefore, the control limits should always be within the specification limits
(upper specification limits, lower specification limits)
The Control Chart can be used in each phase, since it is an universal method for visualizing the process performance
In the Control Phase of DMAIC it is especially essential because here we want to prove the long-term sustainability of the improvement.
Besides that, the Control Chart can easily be implemented as a standard, e.g. in the Shopfloor Management on a daily basis.
What are some things to keep in mind with control charts?
A control chart sets up control limits to differentiate between common and special causes
Upper/Lower Control Limits (UCL/LCL) are set by the process
Common causes lay inside those control limits - they are constantly present and act simultaneously (background noise/noise)
Special causes lay out of those control limits - they act at a specific point in time in a specific location (systematic variation/signal)
Upper/Lower Specification Limit (USL/LSL) are set by the customer requirements
What three things help you to sustain improvements?
- Shopfloor Management
- Standards
- Stable processes
What is a “Bias” or “offset” error?
is the deviation of the output, when the device is in its zero position, compared with the optimum value
What is a Drift error?
Drift errors are caused by deviations in the performance of the measuring instrument (measurement system) that occur after calibration. Major causes are the thermal expansion of connecting cables and thermal drift of the frequency converter within the measuring instrument.
What is a “Stability” or “spam” error?
Stability in numerical linear algebra Consider the problem to be solved by the numerical algorithm as a function f mapping the data x to the solution y.
The main causes of error are round-off error and truncation error.
What is a Repeatability error?
Repeatability error is the maximum difference in output when approaching the same point twice from the same direction.
The difference between output readings for two or more consecutive pressure cycles to rated range under duplicate conditions, approached from the same (increasing or decreasing) direction.
What is a Reproducibility error?
Refers to the inability to get the same answer from measurements taken by different people under identical conditions
Reproducibility refers to the variation in measurements made on a subject under changing conditions