chapter 9 Flashcards
Define statistical analysis
the application of mathematical techniques for collecting, organizing, describing and analyzing and interpreting numberical data to support decision making and research.
What is the term given to numerical data
Statitics
What is “population” or “data set” in terms of statistics?
a complete set of collected data- an array of all the values in a population.
some data sets are found in an indentificable distribution. what is this?
a data set with a specified set of characteristics.
A data distribution includes both data values and the probability of observing each value. what term is used for this
probability distribution.
What is a distribution function?
a smooth curve representing a data set.
- useful in representing large data.
What are the 3 important types of distributions?
- frequency distributions
- normal distributions
- nonnormal distributions
Define frequency distributions
a set of data organized to show the frequency of the occurrence of each possible outcome of a repeatable even observed many times.
- good for large data sets and in assigning probabilities.
What is a check sheet
a page divided into areas, one area for each sorting category.
- recorder marks observation data by marking marks in the appropriate category
- ie scantron sheets.
What is a bar chart?
a graphical display of a frequency distribution
Which charts represent the relative size of one catergory to the entire data distribution that belong to a each of a few different classifications?
pie charts and bar charts
What is a pie chart
presents data in the shape of a circle that has been divided into radial sections.
Which two charts/diagrams are traditionally included in the classic collection of seven basic quality tools?
Scatter diagrams and bar charts.
Define normal distribution
the most common form of probability distribution. It is a symmetrical distribution in which the number of values that are less than the mean, is the same number of values freater than thte mean.
- values considered discrete variables.
- useful in the branches of statistics related to sampling and forecasting.
What is a discrete variable?
variables that include a finite or limited, number of values.
- usually only has 1 value.
what are non-normal distributions?
asymmetrical distributions in which the number of values on one side of the mean is greateer than the number of values on the other side of the mean.
- usually has a tail, and peek further to one end fo the curve.
True or false
The variables in nonnormal distributions are generally random rather than discrete variables.
true
What is a random variable?
a variable whose values represent all possible outcomes.
Statistical measures provide information about two characteristics of data distributions: central tendency and dispersion. Define central tendency
representative values that describe the values in the middle of a set of observations. The main measures are: mean, median, mode
Statistical measures provide information about two characteristics of data distributions: central tendency and dispersion. define measures of dispersion
representative values that describe the distribution of data around specific central values.
- main measure of dispersion are: range, variance, and standard deviation.
What is descriptive statistics?
describe characteristics of a complete population of known values.
what is inferential statistics? AKA sampling statistics
characteristics of a known sample population in an effort to describe the complete population underlying the sample.
define population mean
Calculated by summing all the values in a iven population and then dividing by the total number of values in the population. Can be expressed as
= sum of values/number of values
What are some limitations of the mean?
- population values
2. outliar data
What is an outlier in terms of data?
an extreamly high or low value that is not representative of the other values in a given data population.
define the term median
the middle value of a set of values that is arranged in numerical order.
- used to describe economic characteristics of a dermographic group ie: median family income, median household size, or median education level.
What are the two steps to finding the median
- arrange the data values in numerical order
2. count the number of values in the data set. A data set contains either an odd or even number of values.
What is the diferrences in terms of median, between odd number and even number sets?
- odd number: median is the middle value
2. even number: median is calculated by taking the average of the two middle values.
what are the limitations of the median?
limited as a measure of central tendency. if evenly distributed this is a great way to analyze data, if its uneven the median does not accurately represent all the values.
define mode “population mode”
the statistical measure that identifies the valye that appears the most often in the data.
- good for identifying baseline values or data patterns that might require attention.
What are some limitations of the mode?
- useless as a measure of central tendency, including when a population has either
1) more than one mode
2) no mode, because no number appears more than once.
What is the simplest measure of dispersion?
the range, which is the difference between the lowest and highest values in a particular population.
What is variance
a measure for describing how far values in a distribution life from the mean.
How is the variance calculated?
average squared distance between the mean and each individual item in a data set.
What do you call a variance for an entire poplation?
population variance.
True or False
the lager the variance, the farther the values are from teh mean?
yes,
since larger variances indicate that values in the population are farther from the mean and more dispersed.
What are the steps that can be used to find the variance of a population?
- list the values in the population
- calculate the mean
- Find the distances between each item in the population and the mean
- square each distance. ie: multiply each distance by itself.
- find the sum of the squared distances.
- divide the sum of the squared distances by the number of items in the population
What are the limitations of variance?
not necessarily an accurate measure of the dispersion of values within the population.
does not provide information about the actual distances between data items or between individual data valyes and the mean
What is standard deviation?
a measure of dispersion in a data set.
- the larger the SD the father the values are form the mean.
- used in benchmarking activities.
What is a population standard deviation?
square root of the variance in the population.
How do you calculate the standard deviation?
By calculating the square root of the variance of the data set.
Square root: is the value that if multipled by itself produces the given number.
What are the limitations of standard deviation?
its used to serve as a measure of dispersion for normally distributed data. Most real world values are not distributed normal.
Define a normal curve.
a depictuion of a probability distribution in which the midpoint of the bell is the mean value.
The mean, median and mode are all the same in a normal dsitribution and the curve is bell shaped.
1. 68.27 % of data falls within on standard deviation of the mean.
2. 95.45 % of the data fall within plus or minus two standard deviations of the mean
3. 99.73% of data in falls within +/- 3 standard deviations of the mean.
What is a control chart?
AKS process control chart or statistical control chart
a chart showing a plot of data observations about a given process against a measure of time.
- Horizontal: time
What is the difference between control and out of control?
1) inside the normal range
2) outside the normal range
What do you call the threshold values of the normal range in a control chart?
control limits
A control chart has two control limits: upper and lower. Define these
1) upper: highest permissible value of the upper boundary level for an in-control observation.
2) lower: lowest permissable value or the lower boundary level for an in-control observation
How do you set the norms for outcomes from a process?
record 20-30 data observations for outcomes
1) find the mean value- this creates the center line of the control chart.
2) consider what range the values is acceptable for outcomes from the process. - use this to set upper and lower limiets @ 2-3 standard deviations from the mean value.
* first norms are usually tentative- they change with larger number of observations.
Name some aspects of the operational performance that the financial companies can use the control chart to track emerging observations.
- not-taken rates in NB
- not-in-good order applications in NB
- applications DEC in UW
- first-contact resolution in contact center
- wait times on hold for tele-cx
- cx satisfaction ratings
- abandoments rates for callers on hold
- not-in-good order claims
- talk times
- benefits flagged by internal control.
What are inferential statistics?
methods that allow analysts to draw conclusions about a population on the basis of data gathered from only a portion, or sample, of the pop.
What is the law of large numbers?
a mathematical concept which states that, under normal circumstances, the more times a particular event is observed, the more likely it is that the observation will approx. the true probability that the event will occur.
ie: larger samples= >greater probability that the stats are accurate.
Define “a sample”
small specimen representative of a larger group or population.
What is probability sampling?
AKA: random sampling
technique in which each member of a population has a determinable chance (probability) of being selected.
- randomly picked.
What are some common examples of probability sampling?
- simple random sampling
- systematic random sampling
- stratified random sampling
Define simple random sampling
method of probability sampling in which every member of the data population has an euqal chance of being sleected in the sample.
- can use table of random sampling numbers.
- or randome number generator.
What is a random number generator?
table or software application that auto identified a pattern of values that would be produced by sampling a population distribution.
What do you call a selection process that is not random?
Biased
What is systematic random sampling?
involves selecting items from a population at a uniform interval which is generally measured by time, order, or space.
- every item in the population has an equal chance of being selected.
- doesnt need a generator.
- if sample size is large enough its a good statistic
How can a research select a representative sample when using systematic random sampling?
- select large sample size
- select random interval size
- choose random starting point for sampling
- proceed through entire population.
When would you find limitations of simple and systematic random sampling?
if the population is varied or clearly segmented.
What is stratified random sampling?
technique in which the user divides the population into segments, and then selects from each segment at random a proportional number of items.
- created to address problem of segmented populations.
Define the not randomized sample selection method “non-probability sampling”
bases sample selection on specific personally selected criteria.
- not useful in predicting the characteristics of a population based on the sample characteristics.
When would a company use non-probability sampling?
when there is reason to believe that examining a few clearly defined members of a population can identify a “typical” member. 1
how does one calculate the sample mean?
calculated by summing all the values in the sample and dividing the total by the number of values in the sample.
How do you calculate the sample variance?
- calcualte the sample mean
- find the distance of each value in the sample from teh mean and the square each distance.
- sum the square distances
* * 4. divide the total squared distances from the number of values minus 1.
(this is designed to correct for sampling bias)
How do you find the sample standard deviation?
square root of sample variance.
True or False
The values derived from sample data are usually different from the population values?
True
What is the sample error?
the difference between the population values and the values derived from a sample of the population.
What is validity in the word of statistics?
the degree to which an observed result can be relied upon and not attributed to random error in sampling or measurement.
- depends on sample size.
In terms of validity: what is the degree of confidence?
defined as the likelihood that a calculated value accurately predicts the true value.
- larger the sample size, the higher the degree of confidence.
In terms of inferential statistics and validity: what is the margin of error?
the measure that indicates the likely range of inaccuracy in a given sample, relative to a result based on the total population.
: basically indicates how accurately a given sample represents the population.
- larger sample = smaller margin of error.
- 3% is acceptable.