Six Sigma Statistics and Graphical Presentation Flashcards

By Ron Crabtree

1
Q

Sample

A

Subset of the overall population.

Make sure they are representative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Three most standard descriptive/characteristic statistics

A

Mean (arithmetic average)
Standard deviation
Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the symbols for the three most common data characteristics for both population parameters and sample statistics?

A

MEAN
Pop Par: mu
Sample stat: x-bar

STANDARD DEVIATION
PP: sigma
SS: s

VARIANCE
PP: sigma squared
SS: s squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Descriptive statistics

A

Used to describe the process itself

One of most common tools: histogram. Variation, centering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Inferential statistics

A

Making inferences about the population from your sample

It’s possible to learn meaningful information with as little as 30 measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare descriptive vs. inferential statistics

A

DESCRIPTIVE
Approach: More inductive (induce information)
Goal: Summarize the data to make decisions
Tools/Techniques: Histograms, interrelationship diagrams, process maps, fish bone diagrams
Interpretations: Fairly straightforward, Not as difficult to create

INFERENTIAL
Approach: Deductive (deduce information)
Goal: Infer population characteristics to predict future outcomes
Tools/Techniques: More advanced/complex, Chi squared, binomial, poisson distributions, hypothesis testing, confidence intervals, correlation, regression analysis.
Interpretations: Complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Normal distribution

A

Most of the values in the data set are close to the average for the data. Standard deviation is small. Also allows for easy inference.
AKA The bell-shaped curve.

The 69-95-99 Percent Rule
\+- One St Dev: 68.26 %
Rule
\+- Two St Dev: 95.44%
Rule
\+- Three St Dev: 99.74%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the basic tenants of the central limit theorem?

A

Basic tenants:
The sampling distribution of the mean approaches a normal distribution as the sample size increases.

n = 100, get a curve 
n = 500, a peak appears
n = 1000, a normal distribution appears

As you increase samples, you get closer to a perfect bell curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Something something central limit theorem

A

n = sample size for the sample mean

n = 4, get a near normal sampling dist

n = 30 will make the distribution normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Basic tenants of confidence interval

A

Used to state some level of confidence that the mean of your population falls within a certain range

  • Collect data for sample
  • Calculate mean and standard deviation of sample
  • Then make inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hypothesis testing

A

Test a null hypothesis, or a state of nature of which you do not know the true outcome.

H-naught (H0) typically set to test of two values are equal, or if greater/lesser than or equal to

H-sub-a: alternative of the null hypothesis.

Use data to infer the true state of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Control chart

A

Typically plotting data pulling at a consistent rate. Pooling samples

X axis is values (ex. 20 values), i.e. pulling 5 parts every hour and giving mean of those

Center line (mean of process)
UCL
LCL

Infer data about the entire population therein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Measures of central tendency

A

Whether or not the center of your process falls close to your target. Looking at the centering of your process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measures of dispersion

A

How much variation within your process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What performance does six sigma aim for?

A

On target performance

As little variation as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Do you need to look at both central tendency & dispersion?

A

Yes. Need to look at BOTH measures to understand your data

Graph:

17
Q

The three measures of central tendency

A

Mean - arithmetic average
x-bar = (sum of values/samples)/n

Median - middle data-point based on ordering
Data point to use = (n+1)/2
Ex. n = 7, use 4th data point

Mode - most frequently occurring value
You can have more than one mode, a bimodal distribution.

18
Q

The three measures of dispersion

A

RANGE
= Maximum value - minimum value
Lets us ask: Which process is more tightly grouped around the mean?

STANDARD DEVIATION
Gives a little more information about how much each data point varies from the mean
AKA Sigma values
Calculation:
s = square root of 
(Sum of all values (Xi - X-bar)^2) / n-1
Xi = a score in the distribution
The smaller the number, the less variation.

VARIANCE
The average of the squared differences from the mean.
Central to projects - goal is to reduce variation around the mean.
Not taking the square root, so not expressed with units of data.
Calculation: same as standard deviation but WITHOUT square root
(Sum of all values (Xi - X-bar)^2) / n-1
Xi = a score in the distribution
The smaller the number, the less variation.

19
Q

Frequency distribution table

A

TWO COLUMNS
1st - Classes
2nd - Frequency of classes in data
(Optional 3rd) - Frequency as percentage

Usually collected with a check sheet

ex Determining how often a park is visited over time by certain classes of visitor.

Histogram are most frequent illustrations of frequency distributions. Or a pie chart.

20
Q

How to make a frequency distribution table

A
  1. Organize the data into class intervals (ex 0-9, 10-19, etc)
    - Remember, intervals must be mutually exclusive (it’s impossible to fall into 2 categories)
  2. Record the data in the tally column
  3. Calculate frequency (percentage)
21
Q

Tips for class intervals

A

Class intervals should be based on the number of data points.

  • < 100 data points - 5-10 classes
  • > 100 - 10-25 classes
  • OR classes = square root of number of data points

To determine class interval

  • Range/number of classes (or class interval = (maximum value-minimum value)/number of classes
  • Make sure classes are mutually exclusive
  • Include all data points
22
Q

Cumulative frequency

A

Builds off of the frequency distribution table, but it provides information on the cumulative data.

Used to determine the number of observations above or below a particular value of the data set.

Helpful in understanding the behavior of the data.

Ends in additional columns for

  • Cum frequency
  • Cum percentage
23
Q

Scatter diagrams

A

A way to graphically understand if there is a relationship between two variables.

X - Independent variable (Causal variable)
Y - Dependent variable
(Result)

Straight line through the dots

  • Calculated by determining the “best fit” line
  • Based on the slope, you can say whether they’re correlated
24
Q

What questions do scatter plots answer?

A
  • Is there a relationship?

- Is there a common pattern?

25
Q

What are the types of correlation?

A

High positive: closely grouped points

  • Ascending from left to right
  • Means when you increase this process parameter, you’re increasing the output characteristic

High negative: still closely grouped

  • Descends from left to right
  • As you increase process parameter, you’re decreasing output characteristic

Low Correlations

  • Still have a best fit line, but the data points are widely scattered
  • The distance between points and the best fit line is large
  • This is a weak relationship and should not be used for estimation

No Correlation
- Uniformly scattered data points with no discernible line

Non-Linear Correlation

  • Points are still closely grouped, and move together
  • But, prediction equation is non-linear (squiggly line)

Outliers
Points very far from the cluster.
Require more information to understand (inaccurate measurements? process failure?)

26
Q

Normal probability plot

A

A graphical way of comparing two data sets based on empirical data

Usually built from scatter plots

27
Q

Creating a probability plot

A

8 DAY CALCULATING COMPLAINTS
Start with cumulative frequency table.
We end up with cumulative probability

Then we graph on the probability plot (day on x axis, cumulative probability on y-axis). Then we draw the best fit line

Did we end up with a straight line? If so, it’s a normal distribution.

28
Q

What question does a probability plot help answer?

A

Trying to understand whether or not you have a normal distribution
- If not, we either transform data or try different techniques

29
Q

Why are probability plots beneficial?

A

Probability plot is beneficial over other graphical techniques because it can use small samples AND is easy to implement.

30
Q

Histograms

A

A way to graphically display information from a frequency distribution

Answers:

  • Is the distribution normal?
  • If something happened in your process, you’ll see it in the data
  • Also compare outputs from 2 different processes
31
Q

How to construct a histogram

A

X-axis (depicts your mutually exclusive variable) can be either intervals from the frequency distribution OR a specified value.

Y-axis is what your frequency values are (number of calls dropped per day, frequency of defective parts per day)

Can be constructed from a check sheet

32
Q

How to interpret a histogram

A

If the distribution is normal:

  • What’s the dispersion/spread?
  • Were there changes in the data?
  • You can also see central tendency (mean, mode) & dispersion (range).
  • Multiple peaks/bimodal/multimodal?

Mode = tallest peak of the distribution

Outliers - either on the far left or far right

33
Q

Stem-and-leaf plots

A

Similar to a histogram
Break your data into groups

Stem (1st column) - the initial part of the data

Leaf (second column) is the last/final data point

EXAMPLE
We have a group of numbers between 453 and 527.

Stem - Larger two digits (hundreds and tens)

Leaf - final digit (ones place)

34
Q

What is the advantage of a stem and leaf plot?

A

By looking at the “leaf” column, you can see the relative frequency

Can also see the distribution of your data

Can also compare 2 data sets (grades on 2 quizzes a week apart)

35
Q

Benefits of stem and leaf plot

A

Allow for a quick overview of the data

Helps highlight outliers

Used for variable AND categorical data

36
Q

Cons of stem and leaf plot

A

Not good for small data sets

Not good for very large data sets

Need a mid-range data set.

37
Q

Box and whiskers diagram

A
AKA The box plot
Useful in showing how much variation in your data because
- Shows lower 25%
- Middle 50%
- Upper 50%

How spread out it is and where the data lies

38
Q

Making a box and whiskers diagram

A
  1. Calculate median of data (middle)
  2. Calculate lowest whisker (bottom 25%)
  3. Calculate mean between median and lowest point, that’s the first quartile’s bottom.
  4. Do the same from the highest point to get the third quartile.
  5. Remaining middle is interquartile range (25%-75%)
39
Q

How to interpret box and whiskers plots

A

Useful for comparing 2 different samples of data
- Compare 1st to second

Narrow interquartile range = process more in control

See outliers

Identify shape of interquartile range