Qu1 Chapter 2: Presenting data in tables and charts - NOtes Flashcards
How do you organize numerical data
- the ordered array 2. the stem-and-leaf display
What is the ordered array
ordered sequence of raw data, in rank order from the smallest observation to the largest - makes it easy to pick out extremes, typical values and the area where the majority of the values are concentrated

What is a stem and leaf display
valuable tool for organizing a set of data and understanding how the values distribute and cluster over the range of the observations in the data set
it separates data enteries itno leading digits, or stems adn trailing digitis or leaves
First column of numbers is the stem, or leading digitis
the leaves or trailing digits, branch out to the righ to the right of these numbers

What is raw form
as collected
24,26,24,21,27,27,30,41,32,38
What is frequency distribution
a summary table in which the data are arranged into conveniently established, numerically ordered class groupings or categories
its a table that displays the frequency of various outcomes in a sample
each entry in the table contains the frequency or count of th eoccurences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.

What is frequency Distribution
summary table in which the data are arranged into class groupings or categories
Describe Frequency Distributions
- sort raw data in ascending order
12,13,17,21,24,24,26,27,27,30,32,35,37,38,41,43,44,46,53,58
- Find range: 58 -12 = 46
- select number of classes : 5 (usually between 5 and 15)
- compute class interval (width) 1o (46/5 then round up)
- determine class boundaries (limits): 10,20, 30, 40, 50, 60
- Compute class midpoints: 15, 25, 35, 45, 55
- Couny observations and assign to classes
The histogram

to enhance the analysis of a frequency distribution we can use what?
- relative frequency distribution (proporiton) or
- precentage distribution
What is the formula for relative frequency distriution
frequencies in each class of the frequcency distribution / total number of observations
what is the formual for percentage distribution
multiply each relative frequency or portion by 100
what is the cumulative distribution
a table of cumulative precentages
- useful technique for tabulating dta
- describes the probability that a real0valued ramdon varaivel x with a given probability distirubiton will be found to have a avlue less than or equal to x.
describe the histogram
a chart in which the rectangular bars are constructed at the boundaries of each class “ a picture is worht a 1000 words”
need more
what is the polygon
- the percentage polygon is forme dby having the midpoint of each class represent the data in that class and then connecting the sequence of midpoints at their respective class precentages
- when plotting the variables of interst is displayed along the horizontal axis
- the verticle axis represents the number, proportion, or precentage of observations per class interval

what is the cumulative polygon (Ogive)
- a graphical represnation of a cumulativ eprecentage distribvution
when plotting cumulative poloygons
- the variable of interest is displayed along the x-axis - the y-axis represnts the percentages of cumulated observations
What is Bivariate numerical data
data relating to tow numerical varialbes
for example, height and weight
- use scatter diagram
What are the tables and charts for Categorical Data?
- Summary Table
- The Bar Chart
- Pie Chart
- Pareto Diagram
Explain summary table for categorical information
- for categorical data
- similar in fomrat to the frequency distrubution table
- how to develop
Explain the bar chart
used for categorical
- each category is depcited by a bar
- the length of the bar indicates the frequency or percentage of observation falling into the category
explain the pie chart
- percentages are rounded to the nearest percent
- based on the fact that the circle has 360 degrees
- pie is divided into slices according to the percentages in each category
what is the purpose of graphical presenation
to display data accurately and cleary
researc on human preception of graphs shows:
bar chart is preferrred to the pie chart
when do you use a bar chart
if comparision of cateogories is most important
when do you use a pie chart
observing the porpotion of the whole that is in a particular category is the most important
which diagram provides more visual infomraiton than either the bar chart or pie chart
Pareto diagram
what is the Pareto Diagram
Speical type of vertical bar chart
- categorized repsonses are plotted in the decending rank order of their frequencies
- combined with a cumulative polygon on the same graph
- ability to separate the “vital few’ form the “trival many”
- this enables one to focus on the important categories
- great when categorical vairable of interest contains many categories
what is the pareto diagram used for
- when the cateogrical variable of interest contains many categories
- use din analyzing process and product quality
Definition of Pareto chart
A Pareto chart is a bar graph. The lengths of the bars represent frequency or cost (time or money), and are arranged with longest bars on the left and the shortest to the right. In this way the chart visually depicts which situations are more significant.
when to use a Pareto Chart
When to Use a Pareto Chart
When analyzing data about the frequency of problems or causes in a process.
When there are many problems or causes and you want to focus on the most significant.
When analyzing broad causes by looking at their specific components.
When communicating with others about your data
what is the Pareto chart procedure
Pareto Chart Procedure
Decide what categories you will use to group items.
Decide what measurement is appropriate. Common measurements are frequency, quantity, cost and time.
Decide what period of time the Pareto chart will cover: One work cycle? One full day? A week?
Collect the data, recording the category each time. (Or assemble data that already exist.)
Subtotal the measurements for each category.
Determine the appropriate scale for the measurements you have collected. The maximum value will be the largest subtotal from step 5. (If you will do optional steps 8 and 9 below, the maximum value will be the sum of all subtotals from step 5.) Mark the scale on the left side of the chart.
Construct and label bars for each category. Place the tallest at the far left, then the next tallest to its right and so on. If there are many categories with small measurements, they can be grouped as “other.”
Steps 8 and 9 are optional but are useful for analysis and communication.
Calculate the percentage for each category: the subtotal for that category divided by the total for all categories. Draw a right vertical axis and label it with percentages. Be sure the two scales match: For example, the left measurement that corresponds to one-half should be exactly opposite 50% on the right scale.
Calculate and draw cumulative sums: Add the subtotals for the first and second categories, and place a dot above the second bar indicating that sum. To that sum add the subtotal for the third category, and place a dot above the third bar for that new sum. Continue the process for all the bars. Connect the dots, starting at the top of the first bar. The last dot should reach 100 percent on the right scale.
what are the ways of tabulating and graphing Bivariate Categorical Data?
- Contingency table
- Side-by-side bar chart
What is a contingency table
- a two way table of cross classification
why would you use a contingency table
to simultaneously study the responses to two cateogridal variables
give an example of use for a contingency table
you might be interested in examining whether there is a pattern or realtionship between a fund’s 5 star rating and its fundscoep risk rating
desicribe the side-by-side chart
useful way to visually display bivariate categorical data when looking for patterns or relationships
what are the basic features of a proper graph
- Showing the data
- Getting the viewer to focus on the substance of the graph, rather than on how the graph was developed
- Avoiding distortion
- Encouraging comparisons of data
- Serving a clear purpose
- Being integrated with the statistical and verbal descriptions of the graph
What are the principles of graphical excellence
- Graphical excellence is a well-designed presentation of data that provides substance, statistics and design
- Graphical excellence communicates complex ideas with clarity, precision and efficiency
- Graphical excellence gives the viewer the largest number of ideas in the shortest time, with the least ink
- Graphical excellence almost always involves several dimensions
- Graphical excellence requires telling the truth about the data
what are the ways of evaluating the excellence of a graph
- data-ink ratio
- lie factor
Describe Data-ink ratio
is the proprotion of the grapic’s ink that is devoted to non-redundant dispaly of data informaiton
data ink ratio = data ink / total ink used to print the graphic
- the objective is to maximize the proportion of the ink used in the grap that is devoted to the data
- with reasonable limits, non-data ink and redundant dta ink should be eliminated
what are some examples of non-data ink
- aspects of the graph that do not relate to the substantive features of the data
- as well as grid lines that may be imposed on the graph
- chartjunk
What is chartjunk
decoration that is non-data ink or redundant data ink
- in its extreme form it represnets self-promotion graphics that focus the viewer on the style of the graph, not the data presented
what is the lie factor
is the ratio of the size of the effect shown in the graph to the size of the effect in the data
- not using a graph to distort the data
(ie using wine glasses)
- the amount of distortion can be measured using the lie factor
what are the essential components of statistical graphs
- Title which answsers:
a. what
b. where
c. when - coordinate axes (label)
- Scale divisions - “tick marks”
- zero line or other line of reference
- Break in vertical scale, when necessary
- Scale numbers and legends
- Curves and Curve labels