Block 2 Flashcards
attribute data?
discrete integers, a quantity
types of data collection?
- direct observations (needed for quality calcs)
- questions (market research)
Types of data?
-attribute data
variable data
Variable data?
continuous values
precision?
how reproducible a value is
accuracy?
how close to the true value it is
ways to describe data?
- frequency distribution
- measures of central tendency
- measures of dispersion
Ways to sort data?
- categorically (eg.blood type)
- grouped (eg. 1
relative frequency?
Out of 1, the proportion of the data in the specific group/value
types of histogram?
- frequency histogram
- relative frequency histogram
- cumulative frequency histogram
- relative cumulative frequency histogram
determining class width?
range/number of classes
determining the number of cells in a histogram?
<100 →5-9 cells
100-500 → 8-17 cells
>500 → 15-20 cells
Graphs for distribution?
- histogram
- bar graph
- polygon of data
- cumulative frequency distribution
distribution graph with mean closer to the left?
skewed right / positively skewed (more values right of the mean)
two peaks?
bimodal
high peak/kurtosis?
leptokurtic
low peak/kurtosis?
Platykurtic
negative kurtosis?
flatter than a normal distribution with same mean and standard deviation
aspects of a distribution?
- location (mean)
- spread
- shape (skew)
3 measures of central tendency?
- Average
- Median
- Mode
average of ungrouped data?
The mean
average of grouped data?
sum of (frequency*midpoint) / total frequency
average with different sized groups?
weighted average sum of (frequency*average of each group) / total frequency
Finding the median value for grouped data
- half the range
- count up to find the group
- find how many values intro the group it is
- divide by the group frequency and multiply by the group interval
Mode?
peak
can have many or none
Measures of dispersion?
- range
- standard variation
- variance
Range?
difference between max and min in the dataset
Standard deviation?
sum of differences between value (Xi) and average (Xbar) squared, divided by number of values (n) -1, square rooted
Problems with range?
less accurate with more observed values, more likely to get an outlier
When to use standard deviation?
went n>10
otherwise, use range
Measures of distribution shape?
- skewness
- Kurtosis
- Coefficient of Variation
Skewness?
a3=0 →symmetrical
a3>0 skewed to the right
a3<0 skewed to the left
+1 or -1 is big
How to determine if a skewness value is reliable?
n>100
needs to be unimodal
What does Kurtosis value (a4) mean?
a4=3 → normal distribution
a4>3 → more peaked
a4<3 → less peaked
Correlation variation / Coefficient of Variation
standard deviation (s) *100%/average (Xbar) units for s and the mean cancel, thus the units are %
Coefficient of Variation vs standard variation?
relative to the mean - more relevant
population?
set of items of interest
Sample?
subset of a population
sample statistics?
- Average (Xbar)
- Sample standard deviation (s)
population parameter?
- Mean (Xbar0 or lowercase mu)
- Standard deviation (So or lowercase sigma
statistic vs parameter?
statistic for a sample
parameter for a population
average and standard distribution of standard normal curve?
average (mu) = 0 standard distribution (sigma) = 1
finding the percentage above/below a value on a normal curve?
transfer to standard normal distribution then use the tables
How check for a normal distribution?
- visual inspection of a histogram (unimodal, symmetrical, tapering tails)
- skewness = 0
- Kurtosis = 3
Probability plots?
- order values small to large (and rank from 1 for smallest)
- use plotting position equation to find percentile
- plot measured value over percentile and draw line of best fit
Chi-square goodness of fit?
-compares observed value to expected value
Tests for normality?
- probability plots
- chi-square goodness of fit
simplest way to find a cause and effect relationship?
scatter diagram
drawing a “straight line fit” mathmatically?
use equations for find gradient (m) and y intercept (a)
coefficient of correlation?
goodness of fit to a line of best bit. between -1&1 1=all points on +gradient line -1=all points on -gradient line 0=no correlation