Six Sigma Statistics and Graphical Presentation Flashcards

Question 1

Q

Sample

Answer

A

Subset of the overall population.

Make sure they are representative samples

Question 2

Q

Three most standard descriptive/characteristic statistics

Answer

A

Mean (arithmetic average)
Standard deviation
Variance

Question 3

Q

What are the symbols for the three most common data characteristics for both population parameters and sample statistics?

Answer

A

MEAN
Pop Par: mu
Sample stat: x-bar

STANDARD DEVIATION
PP: sigma
SS: s

VARIANCE
PP: sigma squared
SS: s squared

Question 4

Q

Descriptive statistics

Answer

A

Used to describe the process itself

One of most common tools: histogram. Variation, centering

Question 5

Q

Inferential statistics

Answer

A

Making inferences about the population from your sample

It’s possible to learn meaningful information with as little as 30 measurements.

Question 6

Q

Compare descriptive vs. inferential statistics

Answer

A

DESCRIPTIVE
Approach: More inductive (induce information)
Goal: Summarize the data to make decisions
Tools/Techniques: Histograms, interrelationship diagrams, process maps, fish bone diagrams
Interpretations: Fairly straightforward, Not as difficult to create

INFERENTIAL
Approach: Deductive (deduce information)
Goal: Infer population characteristics to predict future outcomes
Tools/Techniques: More advanced/complex, Chi squared, binomial, poisson distributions, hypothesis testing, confidence intervals, correlation, regression analysis.
Interpretations: Complex

Question 7

Q

Normal distribution

Answer

A

Most of the values in the data set are close to the average for the data. Standard deviation is small. Also allows for easy inference.
AKA The bell-shaped curve.

The 69-95-99 Percent Rule
\+- One St Dev: 68.26 %
Rule
\+- Two St Dev: 95.44%
Rule
\+- Three St Dev: 99.74%

Question 8

Q

What are the basic tenants of the central limit theorem?

Answer

A

Basic tenants:
The sampling distribution of the mean approaches a normal distribution as the sample size increases.

n = 100, get a curve 
n = 500, a peak appears
n = 1000, a normal distribution appears

As you increase samples, you get closer to a perfect bell curve

Question 9

Q

Something something central limit theorem

Answer

A

n = sample size for the sample mean

n = 4, get a near normal sampling dist

n = 30 will make the distribution normal

Question 10

Q

Basic tenants of confidence interval

Answer

A

Used to state some level of confidence that the mean of your population falls within a certain range

Collect data for sample
Calculate mean and standard deviation of sample
Then make inference

Question 11

Q

Hypothesis testing

Answer

A

Test a null hypothesis, or a state of nature of which you do not know the true outcome.

H-naught (H0) typically set to test of two values are equal, or if greater/lesser than or equal to

H-sub-a: alternative of the null hypothesis.

Use data to infer the true state of the population

Question 12

Q

Control chart

Answer

A

Typically plotting data pulling at a consistent rate. Pooling samples

X axis is values (ex. 20 values), i.e. pulling 5 parts every hour and giving mean of those

Center line (mean of process)
UCL
LCL

Infer data about the entire population therein

Question 13

Q

Measures of central tendency

Answer

A

Whether or not the center of your process falls close to your target. Looking at the centering of your process

Question 14

Q

Measures of dispersion

Answer

A

How much variation within your process

Question 15

Q

What performance does six sigma aim for?

Answer

A

On target performance

As little variation as possible

Question 16

Q

Do you need to look at both central tendency & dispersion?

Answer

A

Yes. Need to look at BOTH measures to understand your data

Graph:

Question 17

Q

The three measures of central tendency

Answer

A

Mean - arithmetic average
x-bar = (sum of values/samples)/n

Median - middle data-point based on ordering
Data point to use = (n+1)/2
Ex. n = 7, use 4th data point

Mode - most frequently occurring value
You can have more than one mode, a bimodal distribution.

Question 18

Q

The three measures of dispersion

Answer

A

RANGE
= Maximum value - minimum value
Lets us ask: Which process is more tightly grouped around the mean?

STANDARD DEVIATION
Gives a little more information about how much each data point varies from the mean
AKA Sigma values
Calculation:
s = square root of 
(Sum of all values (Xi - X-bar)^2) / n-1
Xi = a score in the distribution
The smaller the number, the less variation.

VARIANCE
The average of the squared differences from the mean.
Central to projects - goal is to reduce variation around the mean.
Not taking the square root, so not expressed with units of data.
Calculation: same as standard deviation but WITHOUT square root
(Sum of all values (Xi - X-bar)^2) / n-1
Xi = a score in the distribution
The smaller the number, the less variation.

Question 19

Q

Frequency distribution table

Answer

A

TWO COLUMNS
1st - Classes
2nd - Frequency of classes in data
(Optional 3rd) - Frequency as percentage

Usually collected with a check sheet

ex Determining how often a park is visited over time by certain classes of visitor.

Histogram are most frequent illustrations of frequency distributions. Or a pie chart.

Question 20

Q

How to make a frequency distribution table

Answer

A

Organize the data into class intervals (ex 0-9, 10-19, etc)
- Remember, intervals must be mutually exclusive (it’s impossible to fall into 2 categories)
Record the data in the tally column
Calculate frequency (percentage)

Question 21

Q

Tips for class intervals

Answer

A

Class intervals should be based on the number of data points.

< 100 data points - 5-10 classes
> 100 - 10-25 classes
OR classes = square root of number of data points

To determine class interval

Range/number of classes (or class interval = (maximum value-minimum value)/number of classes
Make sure classes are mutually exclusive
Include all data points

Question 22

Q

Cumulative frequency

Answer

A

Builds off of the frequency distribution table, but it provides information on the cumulative data.

Used to determine the number of observations above or below a particular value of the data set.

Helpful in understanding the behavior of the data.

Ends in additional columns for

Cum frequency
Cum percentage

Question 23

Q

Scatter diagrams

Answer

A

A way to graphically understand if there is a relationship between two variables.

X - Independent variable (Causal variable)
Y - Dependent variable
(Result)

Straight line through the dots

Calculated by determining the “best fit” line
Based on the slope, you can say whether they’re correlated

Question 24

Q

What questions do scatter plots answer?

Answer

A

Is there a relationship?

- Is there a common pattern?

Question 25

Q

What are the types of correlation?

Answer

A

High positive: closely grouped points

Ascending from left to right
Means when you increase this process parameter, you’re increasing the output characteristic

High negative: still closely grouped

Descends from left to right
As you increase process parameter, you’re decreasing output characteristic

Low Correlations

Still have a best fit line, but the data points are widely scattered
The distance between points and the best fit line is large
This is a weak relationship and should not be used for estimation

No Correlation
- Uniformly scattered data points with no discernible line

Non-Linear Correlation

Points are still closely grouped, and move together
But, prediction equation is non-linear (squiggly line)

Outliers
Points very far from the cluster.
Require more information to understand (inaccurate measurements? process failure?)

Question 26

Q

Normal probability plot

Answer

A

A graphical way of comparing two data sets based on empirical data

Usually built from scatter plots

Question 27

Q

Creating a probability plot

Answer

A

8 DAY CALCULATING COMPLAINTS
Start with cumulative frequency table.
We end up with cumulative probability

Then we graph on the probability plot (day on x axis, cumulative probability on y-axis). Then we draw the best fit line

Did we end up with a straight line? If so, it’s a normal distribution.

Question 28

Q

What question does a probability plot help answer?

Answer

A

Trying to understand whether or not you have a normal distribution
- If not, we either transform data or try different techniques

Question 29

Q

Why are probability plots beneficial?

Answer

A

Probability plot is beneficial over other graphical techniques because it can use small samples AND is easy to implement.

Question 30

Q

Histograms

Answer

A

A way to graphically display information from a frequency distribution

Answers:

Is the distribution normal?
If something happened in your process, you’ll see it in the data
Also compare outputs from 2 different processes

Question 31

Q

How to construct a histogram

Answer

A

X-axis (depicts your mutually exclusive variable) can be either intervals from the frequency distribution OR a specified value.

Y-axis is what your frequency values are (number of calls dropped per day, frequency of defective parts per day)

Can be constructed from a check sheet

Question 32

Q

How to interpret a histogram

Answer

A

If the distribution is normal:

What’s the dispersion/spread?
Were there changes in the data?
You can also see central tendency (mean, mode) & dispersion (range).
Multiple peaks/bimodal/multimodal?

Mode = tallest peak of the distribution

Outliers - either on the far left or far right

Question 33

Q

Stem-and-leaf plots

Answer

A

Similar to a histogram
Break your data into groups

Stem (1st column) - the initial part of the data

Leaf (second column) is the last/final data point

EXAMPLE
We have a group of numbers between 453 and 527.

Stem - Larger two digits (hundreds and tens)

Leaf - final digit (ones place)

Question 34

Q

What is the advantage of a stem and leaf plot?

Answer

A

By looking at the “leaf” column, you can see the relative frequency

Can also see the distribution of your data

Can also compare 2 data sets (grades on 2 quizzes a week apart)

Question 35

Q

Benefits of stem and leaf plot

Answer

A

Allow for a quick overview of the data

Helps highlight outliers

Used for variable AND categorical data

Question 36

Q

Cons of stem and leaf plot

Answer

A

Not good for small data sets

Not good for very large data sets

Need a mid-range data set.

Question 37

Q

Box and whiskers diagram

Answer

A

AKA The box plot
Useful in showing how much variation in your data because
- Shows lower 25%
- Middle 50%
- Upper 50%

How spread out it is and where the data lies

Question 38

Q

Making a box and whiskers diagram

Answer

A

Calculate median of data (middle)
Calculate lowest whisker (bottom 25%)
Calculate mean between median and lowest point, that’s the first quartile’s bottom.
Do the same from the highest point to get the third quartile.
Remaining middle is interquartile range (25%-75%)

Question 39

Q

How to interpret box and whiskers plots

Answer

A

Useful for comparing 2 different samples of data
- Compare 1st to second

Narrow interquartile range = process more in control

See outliers

Identify shape of interquartile range

Six Sigma Statistics and Graphical Presentation Flashcards

By Ron Crabtree