Collecting Data 1 Flashcards

1
Q

Scales of Measurement

A
  • In order of desirability
  • Nominal
  • Ordinal (Ranking)
  • Interval
  • Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Nominal Scale of Measurement

A
  • Data that consists of names or categories only
  • Allows us to classify the object
    • E.g. Is a famous beach or not
  • Does not allow rank
    • E.g. Doesn’t rank how famous the beach is
  • Cannot determine the interval
  • No ordering scheme is possible
  • E.g. # of M&M colors in a bag
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Ordinal (Ranking) Scale of Measurement

A
  • Data arranged in order
  • Difference between the values cannot be determined or are meaningless
  • A ranking scale
    • E.g. Likert Customer satisfaction scale
      • The difference between a 2 rating and a 4 rating does not mean the customer is twice as satisfied when giving a 4.
    • E.g. Software defect categories
      • 3 UI, 4 data, 1 browser compatibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interval Scale of Measurement

A
  • Data type which is measured along a scale, in which each point is placed at equal distance from one another
    • Always appears in the form of numbers or numerical values where the distance between the two points is standardized and equal
  • Has an interval
  • Data is arranged in order and differences can be found
  • No starting point
  • Cannot be multiplied or divided, can be added or subtracted
  • Ratios are meaningless
    • E.g. Temperature of 3 pizzas. If one pizza is 100 degrees, that doesn’t make a 300 degree object 3 time as hot
  • Examples:
    • Temperature (in Celsius or Fahrenheit)
    • IQ test
    • Grade level, 1st, 2nd, 3rd grade
    • Dates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ratio Scale of Measurement

A
  • Extension of interval level that includes a zero starting point
  • Data is high level variable data
  • There is an inherent zero starting point
  • Both differences and ratio are meaningful
  • Classify objects
  • Rank Objects
  • Has equal intervals
  • Has a true zero point
  • E.g. Watches that cost $200 and $400. The 2nd one is 2 times as expensive as the first
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of Data

A

The type of data you have will dictate what you can do and the tools you can use.

  1. Discrete Data
  2. Qualitative Data
  3. Attribute Data
  4. Continuous Data/Variable Data
  5. Location Data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discrete Data

A
  • Best at discerning whether or not we have a defective product or service
  • “Pass/Fail: is better for failure analysis
  • Counted data is discrete
    • E.g. Number dimples on a golf ball
      • Number of people in a stadium
      • 80/100 to discrete - it is out of a finite set
  • Full numbers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Qualitative Data

A
  • An example of qualitative data is color. It cannot be expressed as a number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Attribute Data

A
  • Anything that can be classified as either/or
  • Very binary
  • Pass/Fail, go/no-go, good/bad
  • Example:
    • Paint chips per unit, percent of defective units in a lot, audit points
  • Attribute charts
    • A kind of control chart to display information about defects and defectives. Helps you visualize variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous Data/Variable Data

A
  • Anything that can be measured on a continuous basis
  • Can always be divided into smaller increments
  • Exists on a continuum
  • Preferred over Discrete
  • Use continuous data where possible because it tells us the magnitude of the issue
  • Helpful for controlling the process and providing enough discrimination
  • Examples:
    • Length (inches, half inch, hundredths of an inch…)
    • Weight
    • Temperature
    • Time
    • Anything you can measure: torque, tension, length, volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Teaching Discrete and Continuous Data

A

Imagine you have a young child who says that he is sick. As a parent, the first thing you do is to touch their forehead to see if they feel warm – that is collecting discrete data.

If it feels like he has a fever, you’re likely to use a thermometer to take his temperature – Another type of data collection. You need to know magnitude of the fever because that will determine the course of action; 105 – ER, 101 – TYLENOL. That temperature reading is continuous data – data that exist on a continuum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Location Data

A
  • You could record on a measles diagram
  • Example:
    • Determining root cause of paint blemishes occurring on a car production line
  • Measles Diagram/Chart
    • Use specifically to analyze the problem’s location and density, not just collecting the count of the problem.
    • Helps determine where the common defects on parts are located
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Converting Types of Data

A
  • Difficult to translate after the fact attribute (go/no go) data to variable. But in most cases, you can find a way during measuring to convert attribute to variable
    • Example: how far out of tolerance
  • Always easy to convert variable data to attribute data if you have a standard.
    • Example: Water is too cold to swim at less than 75 degrees. No go <75. Then put all of the data that is less than 75 to “no go” and all above “go”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Distribution

A
  • Data distribution is a function that specified all possible values for a variable and also quantifies the relative frequency (probability of how often they occur)
  • Distributions are considered any population that has a scattering of data.
    • It’s important to determine the kind of distribution that population has so we can apply the correct statistical methods when analyzing it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Types of Continuous Distributions

A
  1. Normal Distribution
    1. Lognormal Distribution
  2. F Distribution
  3. Chi-Square Distribution
  4. Exponential Distribution
  5. T-Student Distribution
    1. Weibull Distribution
    2. Non-Normal Distributions
    3. Odd Distributions
      1. Bivariate Distribution
      2. Bi-Modal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Continuous Distribution

A
  • A Continuous Distribution containing infinite (variable) data points that may be displayed on a continuous measurement scale.
  • A continuous variable is a random variable with a set of possible values that is infinite and uncountable.
  • It measures something rather than just count and typically described by probability density function (pdf)
  • Simply Continuous = can take many different values
17
Q

Types of Discrete Distributions

A
  1. Binomial Distribution
  2. Poisson Distribution
  3. Hypergeometric Distribution
    1. Geometric Distribution
18
Q

Discrete Distributions

A
  • A discrete distribution resulting from countable data that has finite number of possible values.
  • Discrete Distributions can be reported in tables and the respective values of the random variables are countable
    • Example: Rolling dice, choosing a number of heads etc.
  • Simply Discrete=counted
19
Q

Probability Mass Function (pmf)

A
  • Discrete Distributions
  • Probability mass functions is a frequency function which gives the probability for discrete random variables
  • Aka Discrete Density Function
20
Q

Binomial Distribution

A
  • Discrete Distribution
  • The Binomial distribution measures the probability of the number of successes or failure outcome in an experiment in each try
  • Characteristics that are classified into two mutually exclusive and exhaustive classes, such as number of successes/failures, number accepted/rejected follow binomial distribution
  • Example: Tossing a coin: Probability of coin landing Heads is ½ and the probability of coin landing Tail is ½
21
Q

Poisson Distribution

A
  • Discrete Distribution
  • The Poisson distribution is the discrete probability distribution that measures the likelihood of a number of events occurring in a given time period, when the events occur one after another in a time in a well-defined manner
  • Characteristics that can theoretically take larger values, but actually take small values have Poisson distribution
  • Example: Number of defects, errors, accidents, absentees, etc.
22
Q

Hypergeometric Distribution

A
  • Discrete Distribution
  • Hypergeometric distribution is a discrete distribution that measures the probability of a specified number of successes in (n) trials, without replacement, from a relatively large population (N).
    • In other words, sampling without replacement
  • Similar to Binomial Distribution
    • For the binomial distribution, the probability is the same for every trail.
    • For hypergeometric distribution, each trial changes the probability for each subsequent trial because there is no replacement.
23
Q

Geometric Distribution

A
  • Geometric distribution is a discrete distribution that measures the likelihood of when the first success will occur
  • Discrete probability distribution that represents the probability of getting the first success after having a consecutive number of failures
  • Can have an indefinite number of trails until the first success is obtained
  • An extension of it may be considered as negative binomial distribution
  • Example:
    • You ask people outside a polling station who they voted for until you find someone that voted for the independent candidate in a locate election. The geometric distribution would represent the number of people who you had to pool before you found someone who voted independent.
24
Q

Probability Density Function (pdf)

A
  • Continuous Distributions
  • The probability density function describes the behavior of a random variable.
  • It is normally grouped frequency distribution.
    • Hence, the probability density function is seen as “shape” of the distribution
25
Q

Normal Distribution

A
  • Continuous Distribution
  • Normal distribution is also known as Gaussian distribution.
  • It is a symmetrical bell shaped curve with higher frequency (probability density) around the center value.
  • The frequency sharply decreases as values are away from the central value on either side
    • In other words, values lie in a symmetrical fashion mostly situated around the mean.
26
Q

Lognormal Distribution

A
  • Continuous Distribution
  • A continuous random variable x follows a lognormal distribution if its natural logarithm, ln(x), follows a normal distribution
  • When you sum the random variables, as the sample size increases, the distribution of the sum becomes a normal distribution, regardless of the distribution of the individuals. Same scenario for multiplication.
  • The location parameter is the mean of the data set after transformation by taking the logarithm, and also the scale parameter is the standard deviation of the data set after transformation.
27
Q

F Distribution

A
  • Continuous Distribution
  • The F distribution is extensively used to test for equality of variances from two normal populations.
  • The F distribution is an asymmetric distribution that has a minimum value of 0, but no maximum value.
  • Notably, the curve approaches zero but never quote touches the horizontal axis
28
Q

Chi Square Distribution

A
  • Continuous Distribution
  • The Chi Square Distribution results when independent variables with standard normal distribution are squared and summed
  • A chi-square distribution is a continuous distribution with degrees of freedom. It is used to describe the distribution of a sum of squared random variables.
  • Ex: if Z is standard normal random variable then
    • y =Z12+ Z22 +Z32 +Z42+…..+ Zn2
  • The chi square distribution is symmetrical, bounded blow by zero and approaches the normal distribution shape as the degrees of freedom increases.
29
Q

Exponential Distribution

A
  • Continuous Distribution
  • The exponential distribution is the probability distribution and of the widely used continuous distributions. Often used to model items with a constant failure rate.
  • Closely rated to the Poisson distribution
  • Has a constant failure rate as it will always have the same shape parameters
  • Example: The lifetime of a bulb, the time between fires in a city
  • The definition of exponential distribution is the probability distribution of the time *between* the events in a Poisson process. If you think about it, the amount of time until the event occurs means during the waiting period, not a single event has happened. This is, in other words, Poisson (X=0).
30
Q

T Student Distribution

A
  • Continuous Distribution
  • T distribution or student’s t distribution is a bell shaped probability distribution, symmetrical about its mean.
  • Commonly used for hypothesis testing and constructing confidence intervals for means
  • Used in place of the normal distribution when the standard deviation is unknown
  • Like the normal distribution, when random variables are averages, the distribution of the average tends to be normal, regardless of the distribution of the individuals
  • The t distribution (aka, Student’s t-distribution) is a probability distribution that is used to estimate population parameters when the sample size is small and/or when the population variance is unknown.
31
Q

Weibull Distribution

A
  • Continuous Distribution
  • The basic purpose of Weibull distribution is to model time-to-failure data.
  • Widely used in reliability, medical research and statistical applications.
  • Assumes many shapes depending upon the shape, scale, and location parameters. Effect of Shape parameter β on Weibull distribution:
  • For instance, if shape parameter β is 1, it becomes identical to exponential distribution.
  • If β is 2, then Rayleigh distribution.
  • and If β between 3 and 4, then Normal distribution.
32
Q

Weibull Distribution

A
  • Continuous Distribution
  • The basic purpose of Weibull distribution is to model time-to-failure data.
  • Widely used in reliability, medical research and statistical applications.
  • Assumes many shapes depending upon the shape, scale, and location parameters. Effect of Shape parameter β on Weibull distribution:
  • For instance, if shape parameter β is 1, it becomes identical to exponential distribution.
  • If β is 2, then Rayleigh distribution.
  • and If β between 3 and 4, then Normal distribution.
33
Q

Non-Normal Distributions

A

Generally an assumption is that while performing a hypothesis test that the data is a sample from a certain distribution commonly normal distribution, but always that is not the case that data may not follow normal distribution. Hence nonparametric tests used when there is no assumption of a specific distribution for the population.

Particularly nonparametric test results are more robust against violation of the assumptions. Different types of nonparametric test are Sign test, Mood’s Median Test (for two samples) , Mann-Whitney Test for Independent Samples, Wilcoxon Signed-Rank Test for a Single Sample, Wilcoxon Signed-Rank Test for Paired Samples

34
Q

Odd Distributions

Bivariate Distribution

A
  • The continuous distribution (like normal, chi square, exponential) and discrete distribution (like binomial, geometric) are the probability distribution of one random variable
  • Whereas bivariate distribution is a probability of a certain event occur in case two independent random variables exists it may be continuous or discrete distribution.
  • Bivariate distribution is unique because it is the joint distribution of two variables.

Bi-modal:

  • A bi-modal distribution which has two modes, in other words two outcomes that are most likely compare the outcomes of their region.
  • 2 sources of data coming into a single process screen.
35
Q

Positively Skewed Distribution

A

A distribution is said to be skewed to the right if it has a long tail that trails toward the right side. The skewness value of a positively skewed distribution is greater than zero.

Example: Income details of the manufacturing employees in Chicago indicates that the majority of people earn somewhere between $20K to $50K per annum. Very few earn less than $10K, and very few earn $100K. The center value is $50K. It is very clear from the graph a long tail is on the right side of the center value.

As the tail is on the positive side of the center value, the distribution is positively skewed. Unlike symmetric distribution, it is not equally distributed on both sides of the center value. From the graph, it is clearly understood that the mean value is the highest one, followed by median and mode.

Since the skewness of the distribution is towards the right, the mean is greater than the median and ultimately move towards the right. Also, the mode of the values occurs at the highest frequency, which is on the left side of the median. Hence, mode < median < mean.

36
Q

Symmetrical Distribution

aka Normal Distribution

A

Generally, symmetrical distribution appears as a bell curve. The perfect normal distribution is the probability distribution that has zero skewness. However, it is always impossible to have a perfect normal distribution in the real world, so the skewness is not equal to zero; it is almost zero. Symmetrical distribution occurs when mean, median, and mode occur at the same point, and the values of variables occur at regular frequencies. Both sides of the mean match & mirror each other.

Example: The weights of high school students are reported between 80lb to 100lbs, while the majority of students weights are around 90lbs. The weights are equally distributed on both sides of 90lb, which is the center value. This type of distribution is called a Normal Distribution.

37
Q

Negatively Skewed Distribution

A

A distribution is said to be skewed to the left if it has a long tail that trails toward the left side. The skewness value of a negatively skewed distribution is less than zero.

Example: A professor collected students’ marks in a science subject. The majority of students score between 50 and 80 while the center value is 50 marks. The long tail is on the left side of the center value because it is skewed the left-hand side of the center value. So the data is negative skew distribution.

Here mean < median < mode

38
Q

Statistical Tests Used to Identify Data distribution

A

There are different methods to test the normality of data, including visual or graphical method and Quantifiable or numerical methods.

Visual method: Visual inspection approach may be used to assess the data distribution normality, although this method is unpredictable and does not guarantee that the data distribution is normal. However, visual method somewhat help user to judge the data normality.

Ex: Histogram), boxplot, stem-and-leaf plot, probability-probability plot, and quantile-quantile plot.

Quantifiable method: Quantifiable methods are supplementary to the visual methods. Particularly these tests compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation.

Ex: Anderson-Darling Test, Shapiro-Wilk W Test, Kolmogorov-Smirnov Test etc.,

39
Q

How to Make a Process Follow a Normal Distribution by Using Transforms

A

How to Make a Process Follow a Normal Distribution by Using Transforms

Sometimes you will be analyzing a process and the data will come out in a non-normal shape. Since, normal distributions have wonderful mathematical properties that make analysis and control so much easier, try to transform the data to a normal distribution if possible.

The approach to address the non-normal distribution is to make transformation to “normalize” the data. Some typical data transformation methods are Box Cox, Log transformation, Square root or power transformation, Exponential and Reciprocal etc.,

Box Cox transform

  • A Box Cox transformation is a useful power transformation technique to transform non-normal dependent variables into a normal shape.
  • George Box and Sir D.R.Cox. are the authors for this method
  • The applicable formula is yl =yλ (λ is the power or parameter the to be transform the data).
  • For instance, λ=2, the data is squared and if λ=0.5 a square root is required.

Z transform

  • Z transformation is an analysis tool in signal processing
  • It is a generalization of the Discrete-Time Fourier Transform (DTFT), in particular it applies to signals for which DFTF doesn’t exists thus allowing to analyze those signals
  • It also helps to see the new ideas in the sense of a system with respect to stability and causality
  • Z transform is the discrete time counterpart to the Lapse transform