additional Flashcards
Definition of statistics
the art and science of collecting, analyzing, presenting and interpreting data
the term Statistics refers to
numerical facts such as averages, medians, percentages and maximums that help us understand a variety of business and economic situations
Data
the facts and figures collected, analyzed and summarized for presentation and interpretation
data set
all the data collected in a particular study
Elements
entities on which data are collected
Variable
characteristic of interest fro the elements
Observation
set of measurements obtained for a particular element
What are the scales of measurement
- Nominal scale
- Ordinal Scale
- Interval Scale
- Ratio Scale
What is the nominal scale
the data for a variable consists of labels or names used to identify an attribute of the element
What is ordinal scale
the data exhibits the properties of nominal data and addition, the order or rank of the data is meaningful
What is interval scale
the data have all the properties of interval data and the ratio of two values is meaningful
The statistical method appropriate for summarizing data depends on whether the data are
categorical or quantitative
categorical data
data that can be groped by specific categories
- uses either the nominal or ordinal scale of measurement
Quantitative data
uses numeric values to indicate how much or how many
- uses either the interval or ratio scale of measurement
Cross-sectional data
data collected at the same or approximately the same point in time
Time series data
data collected over several time periods
An observation is the set of measurements obtained for each element in a data set. Hence, the number of observations is always the same as
the number of elements
An observation is the set of measurements obtained for each element in a data set. Hence, the number of observations is always the same as
the number of elements
Qualitative data can be
Discrete (finite)
Continuous (time/ weight) no seperation b/w possible data values
Descriptive statistics
summaries of data which may be tabular, graphical or numerical
Statistical inference
the process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population
Data mining
deals with methods for developing useful decision-making information form large databases
- very useful for companies with strong consumer focus such as retail business, financial organizations, and communication companies
- the process of using procedures form statistics and computer science to extract useful information from extremely large databases
descriptive statistics
tabular, graphical, and numerical summaries of data
Data visualization
used to describe the use of graphical displays to summarize and present information about a data set
frequency distribution
a tabular summary of data showing number (frequency) of observations in each of several nonoverlapping categories or classes
Relative frequency distribution
gives a tabular summary of data showing the relative frequency for each class total = 1
Percent frequency
summarizes the percent frequency of the data for each class total = 100
How can you summarize data for catergorical variables
- tabular or graphical displays
What types of tabular data can be used for categorical variables
Frequency distribution table
- the # of (frequencies) or observations in each of several non overlapping categories
- how many times an element appears
what types of graphical displays are there for categorical variables
- pie charts
2. bar charts
describe pie charts
use relative frequency or % frequency
- not generally the best display, usually people can better judge differences in length compared to slices
Describe bar charts
shows categorical data in frequency, relative frequency or % frequency
pareto diagram
when the bars are arranged in descending order of height from left to right with the most frequently occurring cause appearing first
Often the number of classes in a frequency distribution for categorical data is
is the same as the number of categories
ie. coke, diet coke, ….
it is recommend that classes with smaller frequencies be
grouped into an aggregate class called “other” - classes with frequencies of 5% or less
The sum of frequencies in a frequency distribution always equals
the number of observations
The sum of the relative frequencies in any relative frequency distribution always equals
1.00
the sum of the percentages in a percent frequency distribution always equals
100
How can you summarize Quantitative Variable
- tabular
2. graphical
What is the tabular summary for Quantitative variables
Frequency distribution
- but we need to be more careful in defining the non overlapping classes
what are the steps to constructing a frequency distribution for quantitative data
- determine # of non overlapping classes (5-20)
- width of classes - use the same for each class
large data value - smallest / # of classes- class widths can be rounded
- determine class limits (so each data belongs to one class)
- upper limit and lower limit
What graphical representations can we use for Quantitaitve data
- Dot Plot
- Histogram
- Stem and Leaf
what is the formula to determine the class widths
(largest data value - smallest data value) / # of classes
what is the class midpoint
is the value halfway between the lower and upper class limits
desbribe the dot plot
- one of the simplest graphical summaries
- horizontal axis shows the range for the data
- useful for comparing the distribution of the data for two or more variables
describe the histogram
- for quantitative data (categorical use bar chart)
- similar to a bar chart but does spaces between the boxes
- common for quantitative data
What does the histogram help show
the shape or the skewness of the distribution
What type of skewness are there
- skewed left
- skewed right
- symmetrical
what are some examples of distributions that are roughly symmetrical
SAT scores, heights and weights of people
what are some examples of distributions that are generally skewed right (more data closer to 0 than the higher side)
Data from applications in business and economics often tend to be skewed right
example:
1. housing prices, salaries, purchase amounts, etc
Describe the stem and leaf
graphical display used to show the rank order and shape of a distribution of data
What advantages does the stem and leaf have over a histogram
- easier to construct by hand
- within a class interval, the stem and leaf provides more information than the histogram because the stem and leaf shows the actual data
What advantages does the stem and leaf have over a histogram
- easier to construct by hand
- within a class interval, the stem and leaf provides more information than the histogram because the stem and leaf shows the actual data
frequency distribution, histogram and stem and leaf
does not have an absolute number of rows or stems
stretched stem and leaf, whenever a a stem value is stated twice, the first value corresponds to leaf values of _________ and the second value corresponds to leaf values of __________
- 0-4
2. 5-9
is a stem and leaf display with more than 3 digits possible
yes, note a single digit is used to define each leaf and that only the first 3 digits of each data lvae have been used to construct the display
example the number 1565 - add info
however, it is not possible to reconstruct the exact values
what is an open-ended class (when speaking of classes for a frequency distribution)
open-end class requires only a lower class limit or an upper class limit
- ex. suppose two of the audit times had taken 58 and 65 days. rather than continue with the classes of width 5 with classes 35-39, 40-44 etc, we could simplify it
- we could show an open end class of 35 or more
when do you most often see open end classes
at the upper end of the distribution
sometimes they are seen at the lower end
and occasionally at both ends
the last entry in a cumulative frequency distribution always equals
the total number of observations
the last entry in a cumulative relative frequency distribution is always
1.00
the last entry in a cumulative percent frequency distribution is always
100
How can you summarize data for 2 variables
- Cross tabulations
2. graphically
Define cross tabulations to summarize 2 variables
- both variables can be either categorical or quantitative
- can have one cate and one quant. or combinations of
give an example of a cross tabulation
Restaurant Quality Rate Meal $
1 good $18
2 very good $22
3 excellent $28
4 bad $38
etc.