Six Sigma Correlation, Regression, and Hypothesis Testing Flashcards

1
Q

Summary of correlation

A

Investigate relationship with x factor, inputs, and y (outputs)

Does a relationship exist?

What is it?

What factor has the biggest impact?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Famous correlation maxim

A

Correlation does not equal causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When to use?

A
  • Relating x input to y output
  • Look at their relationship over time
  • Identify key x inputs
  • Measure outputs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Design of Experiments

A

Identifies rigorous methodology to identify x factors, y factors, and outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Examples of correlation

A
  • Hrs of experience of the work correlated to incorrectly installed modules
  • Or visual acuity test to output
  • Age to blood pressure
  • Sales success to level of education/years of experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Scatter plots reintroduction

A
  • Plot
  • Is there a relationship?
  • What type of relationship? (Positive/Negative)
  • How strong?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example of nonlinear correlation

A

Nonlinear relationships are much more complex

EXAMPLE: Oil changes on engine life

  • Manufacturer recommends every 4k miles
  • We know changing every 20k miles has a negative effect
  • But what happens if we change every 1k, 2.5k miles?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlation coefficient

A

AKA Pearson correlation coefficient

An expression of the linear relationship in our data

Values fall between -1 and +1

Helps us understand weakness (help distinguishing between factors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpreting correlation coefficient

A
1 = perfect positive line (fit of piston in an engine)
.82 = closely related, but not super tight/close
0 = no correlation
-.82 = closely related, but not as tight
-1 = perfect negative line (noise in environment's effect on concentration)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Tips on the correlation coefficient

A
  • Only works for linear relationships

- Highly sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculating the correlation coefficient

A

r = !!!

Difficult

GENERAL SUMMARY
The covariance of the two variables divided by the product of their standard deviations. We need:
- Xi- individual values of first variable
- Yi - individual values of the second variable
- n - the number of pairs of data in the dataset

There is a Pearson coefficient lookup table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Causation

A

The act of causing/agency that produces the effect.

Understanding/determining which x-variables result in which variable y outputs of our processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Key components between causation and the x/y factors

A
  1. Asymmetrical (correlation is symmetrical but DOES NOT indicate causation. Causation is asymmetrical, or one directioned)
  2. Causation is NOT reversible. (Hurricane causes the phone lines to go down, but not vice versa)
  3. Can be difficult to determine causation. (Is there a third, unknown variable?)
  4. Correlation CAN help POINT to causation. We rule out data that is unrelated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Common mistakes when looking for causation

A
  1. Genuine causation - clear, uncomplicated data to support proposal of causation
  2. Common response - the common response to the unknown variable occurs when both x and y react the same way to an unseen variable
  3. Confounding - the effect of one variable, x, on y is mixed up with the effects of other explanatory values on the y output that we’re looking for in the process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The statistical significance of correlation

A

First Ask: Are we focusing in on the right variables?

Then: Which of our correlation coefficients are subject to chance?

Next: What’s the significance of the correlations?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

P-Value

A

Used to allow us to determine and measure the significance (not necessarily the importance) of two different relationships.

It does provide statistical evidence of the relationship

Looking for p value of less than 0.05. True when alpha-factor is that. Because we’re shooting for 95% confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What effect illustrates the importance of asking ‘Is the correlation by chance?’?

A

Known as the Hawthorne Effect - paying attention to something will often increase performance.

Hawthorne Electric: Early 20s, had a hypothesis that increasing lighting increases productivity

  • Got a baseline on productivity
  • Then upped lighting by 10%
  • Kept doing until they couldn’t go any higher
  • Then asked, what happens if we turn the lights right back where we started?
  • When they changed it back, productivity increased AGAIN
  • This blew up the lighting = productivity hypothesis

The correlation is that by paying attention to people/productivity, they become more productive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What other question is important to ask regarding correlation?

A

What are the chances of finding a correlation value OTHER than what we estimated in our example.

EX: Someone’s height vs self-esteem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Regression analysis

A

Forecast the change in the dependent variable in our process.

Describe the relationship between predictor variables ( x ) and output y (response variable).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Simple linear regression

A

Gets us a best fit line (red line going through center of the plot). Only one y per x.

Vs. Multiple : where many y’s per x

EXAMPLE
If we’re only comparing height and weight, that’s simple linear.

If we want to do height, age and gender against weight, that’s three different factors, three different multiple linear regressions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Simple linear regression formula

A

y factor = B0 + B1 x+ e

Beta factor - the effect of the process. We run through the formula for various lines, looking for best fit.

Testing for the best fit, determined by the lowest sum of the squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Considerations for simple linear least-squares

A
  1. Nonlinear relationship between x factors and y outputs
  2. Importance of outlier data
  3. Consider the inconsistency of the variance in the residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How does the simple linear least-squares regression help?

A

It’s too cumbersome to use the simple linear regression formula for every possible line. There’s a simple way to find it:

Simple linear least-squares regression

  • Where beta-zero and beta-one are present and estimate the true value of themselves
  • As opposed to the beta-zero being the value of the y intercept, or beta-one being the value of the slope.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Predicting Outcomes with Regression Analysis/Models

A

Regression calculation that allows us to isolate sources of variation

Ex: Sales Forecasting

  • Identifying controllable factors and their effects on sales is a valid exercise
  • What factors come into play for sales success?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Key components

A
  1. Apply a linear equation to the data set (obtain a least-squares line)
  2. Helps us predict future values of x based on existing factors
26
Q

Regression coefficient

A
  1. There are various ways of getting b (the regression coefficient)
  2. Understand the y intercept value (expressed as little a)

Before we use the model, we have to know a and b

27
Q

How to plot and develop data for regression model

A
  1. Plot the scatter diagram (not the line, just the dots)

2. Get x-bar and y-bar (sum up totals, divide by the count)

28
Q

Real life examples for regression models

A
  1. Reducing handle/hold
    Times in contact center
    x factor - time to get CSR computer turned on
  2. Understanding how processing temperature in production of wall material
    Cost of material into pipe is 50-60% of sales
    Can we find an optimal process temperature that gives a better wall thickness?
29
Q

Hypothesis Testing & Inferential Statistics revolve around what 4 things?

A
  1. Draw conclusions about population based on sample data
  2. Test a claim about population parameter
  3. Provide evidence to support opinion
  4. Check for statistical significance
30
Q

What 6 Sigma phase does the hypothesis testing take place during?

A

DMAIC
Analyze

Example: Hypothesis: What would be the effect on customer satisfaction if we reduce the time to answer the phone, or how long it takes to provide a quality answer

Another hypothesis: If we’re able to tighten our control over temperature for pipe production, would we be able to minimize costs while maintaining customer satisfaction levels

31
Q

Describe the descriptive vs relational categories of hypothesis testing

A

Descriptive
What we can physically measure about something (size, form, distribution).
- Our ability to manipulate this

Relational

  • What’s the relationship between the variables?
  • Positive or negative?
  • Greater or lesser than a given value?
  • Ex: reducing handle times in customer contact center’s effect on satisfaction
32
Q

Types of hypothesis tests

A
  • 1-sample hypothesis test for means
  • 2-sample hypothesis tests for the means
  • Paired t-test
  • Test for proportions
  • Test for variances
  • ANOVA - Analysis of Variances
33
Q

Paired T-test

A

We use two sample means to prove/disprove hypothesis about two different populations of the data.

  • Do we see shifts based on the hypothesis
  • Ex: is there a relationship on the handle time for a call vs CSRs experience
34
Q

The 5 steps of hypothesis testing

A
  1. Establish our null and alternative hypotheses (Ho & Ha)
  2. Testing our considerations (what are the things we want to test for, and how will we manage the process)
  3. Calculate test statistics
  4. Whether to apply critical value/p-value method while comparing our desired confidence level to the test results
  5. Interpret results
35
Q

Null hypothesis

A
  • “What they say” and expresses the status quo
  • Assumes any observed differences are due to chance or random variation
  • Often expressed as = , >= or <=
36
Q

Alternative hypothesis

A
  • “What we want to test/prove”
  • Assumes the observed differences are real and NOT due to chance/random variation
  • often !=, > or
37
Q

The null hypothesis

A

Null hypothesis - assuming population parameters of interest are equal and there is no change or difference

Ex: Humidity will not have en effect on the weight of the parts we measure
Ex: The country you live in would not have an effect on your level of life satisfaction

38
Q

The alternative hypothesis

A

Represented by H with subscript “a”.

Wants to look at parameters of interest that are not equal, assuming the difference is real.

Ex: Assume greater level of CSR experience directly correlates to quality of work output

39
Q

Goals in hypothesis testing

A
  1. Reject the null in favor of proving that there is Ha
  2. Need to prove it is statistically significant
  3. We’re expecting to find a no, but we could reject the null
  4. Fail to reject: we find insufficient evidence to claim that the null hypothesis was valid, or the alternative true
  5. Presenting the results - Even though it’s more natural sounding to state in view of the Ha, we actually express in terms of whether or not we’re rejecting the null hypothesis
    - “We reject the null hypothesis.” OR
    - “We fail to reject the null hypothesis.”
40
Q

The types of error

A

Type I error (alpha risk) - constant risk

Type II error (beta risk) - the effect we’re looking for

41
Q

Type I Error (alpha/constant risk)

A

The risk we’re willing to take in rejecting the null hypothesis when it’s actually true (producer’s risk)

  • False alarm
  • False negative
  • Error with the alpha factor

Common alpha factor = 0.05
- Testing: what’s the possibility of making a type 1 error at that confidence level

42
Q

Alpha significance level

A

Signifies the degree/risk of failure that’s acceptable to us in the study at hand.
Helps decide if null can be rejected

43
Q

1-alpha Confidence level acceptance region

A

Signifies the level of assurance we expect with the results of the data being studies
- Describes the uncertainty of the sample method you’re using

44
Q

Type B, beta risk (II)

A

Most common beta risk value is 0.10

  • Similar to failing to find the defective piece when producing a product
  • AKA Consumer risk
  • False positive
45
Q

Are alpha and beta inversely proportional?

A

Yes

46
Q

How do test tails work

A

If Ha mu > mu-o (hypothesis mean) > one-tailed test to the right
If Ha mu < mu-o > one-tailed test to the left
If Ha mu != mu-o, two-tailed test (we’ll find defects on both sides of the data curve)

47
Q

How do we use the concept of critical value?

A

Used to compute the margin of error

Derived by critical value x standard deviation/standard error of statistic

48
Q

How is the critical value test statistic derived?

A

Z = (x-bar - mu)/(sigma/sq root of n)

49
Q

What is the acceptance region?

A

The confidence level of a test of 1-alpha

If alpha factor is 5%, resulting confidence factor would be 95%

50
Q

Two-tailed test

A

If Ha mu != mu-o

Testing on both sides of the mean

51
Q

The power of a test

A

The ability to make the correct decision of a test.

Power/sensitivity of a statistical test of rejecting a null hypothesis when it’s actually false.

Power of a test helps increase the likelihood of rejecting a null hypothesis correctly.

52
Q

Four factors on the power of a test

A
  1. Sample size
  2. Population differences
  3. Variability
  4. Alpha level
53
Q

Sample size

A

Most important part of power of a test.

54
Q

Population differences

A

Particularly important when planning the study.

Sample must be large enough to avoid type II errors.

55
Q

Variability

A

Less variability = more power

Ex. If looking at equivalency exams in schools, we know that sample size and variance matter

56
Q

Alpha level

A

Most common: 0.05
Used to determine critical value

Plenty of instances where alpha factor of 0.05 results in rejecting the null hypothesis, but an alpha factor of 0.01 would not.

57
Q

P-value, what is it, what do we use it for

A

Used to determine statistical significance.

Use it to evaluate how well the data supports the null hypothesis.

58
Q

Key things in p-value

A

Effect size
Sample size
Variability of data

59
Q

What does a low p-value mean?

A

Indicates that the sample data contains enough evidence to reject the null for the population.

60
Q

Rhyming maxim for p-value interpretation

A

“If the p is high, null will fly.”

“If p is low, null will go.”

61
Q

Examples of p value

A

If p value is less than the alpha factor (0.05 in this case), then we reject.

If p value is greater than alpha factor, then we do not reject.