Qu1 Chapter 2: Presenting data in tables and charts - NOtes Flashcards

1
Q

How do you organize numerical data

A
  1. the ordered array 2. the stem-and-leaf display
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the ordered array

A

ordered sequence of raw data, in rank order from the smallest observation to the largest - makes it easy to pick out extremes, typical values and the area where the majority of the values are concentrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a stem and leaf display

A

valuable tool for organizing a set of data and understanding how the values distribute and cluster over the range of the observations in the data set

it separates data enteries itno leading digits, or stems adn trailing digitis or leaves

First column of numbers is the stem, or leading digitis

the leaves or trailing digits, branch out to the righ to the right of these numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is raw form

A

as collected

24,26,24,21,27,27,30,41,32,38

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is frequency distribution

A

a summary table in which the data are arranged into conveniently established, numerically ordered class groupings or categories

its a table that displays the frequency of various outcomes in a sample

each entry in the table contains the frequency or count of th eoccurences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is frequency Distribution

A

summary table in which the data are arranged into class groupings or categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe Frequency Distributions

A
  1. sort raw data in ascending order

12,13,17,21,24,24,26,27,27,30,32,35,37,38,41,43,44,46,53,58

  1. Find range: 58 -12 = 46
  2. select number of classes : 5 (usually between 5 and 15)
  3. compute class interval (width) 1o (46/5 then round up)
  4. determine class boundaries (limits): 10,20, 30, 40, 50, 60
  5. Compute class midpoints: 15, 25, 35, 45, 55
  6. Couny observations and assign to classes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The histogram

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

to enhance the analysis of a frequency distribution we can use what?

A
  1. relative frequency distribution (proporiton) or
  2. precentage distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for relative frequency distriution

A

frequencies in each class of the frequcency distribution / total number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the formual for percentage distribution

A

multiply each relative frequency or portion by 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the cumulative distribution

A

a table of cumulative precentages

  • useful technique for tabulating dta
  • describes the probability that a real0valued ramdon varaivel x with a given probability distirubiton will be found to have a avlue less than or equal to x.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

describe the histogram

A

a chart in which the rectangular bars are constructed at the boundaries of each class “ a picture is worht a 1000 words”

need more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the polygon

A
  • the percentage polygon is forme dby having the midpoint of each class represent the data in that class and then connecting the sequence of midpoints at their respective class precentages
  • when plotting the variables of interst is displayed along the horizontal axis
  • the verticle axis represents the number, proportion, or precentage of observations per class interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the cumulative polygon (Ogive)

A
  • a graphical represnation of a cumulativ eprecentage distribvution

when plotting cumulative poloygons

 - the variable of interest is displayed along the x-axis
  - the y-axis represnts the percentages of cumulated observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Bivariate numerical data

A

data relating to tow numerical varialbes

for example, height and weight

  • use scatter diagram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the tables and charts for Categorical Data?

A
  1. Summary Table
  2. The Bar Chart
  3. Pie Chart
  4. Pareto Diagram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain summary table for categorical information

A
  • for categorical data
  • similar in fomrat to the frequency distrubution table
  • how to develop
19
Q

Explain the bar chart

A

used for categorical

  • each category is depcited by a bar
  • the length of the bar indicates the frequency or percentage of observation falling into the category
20
Q

explain the pie chart

A
  • percentages are rounded to the nearest percent
  • based on the fact that the circle has 360 degrees
  • pie is divided into slices according to the percentages in each category
21
Q

what is the purpose of graphical presenation

A

to display data accurately and cleary

researc on human preception of graphs shows:

bar chart is preferrred to the pie chart

22
Q

when do you use a bar chart

A

if comparision of cateogories is most important

23
Q

when do you use a pie chart

A

observing the porpotion of the whole that is in a particular category is the most important

24
Q

which diagram provides more visual infomraiton than either the bar chart or pie chart

A

Pareto diagram

25
Q

what is the Pareto Diagram

A

Speical type of vertical bar chart

  • categorized repsonses are plotted in the decending rank order of their frequencies
  • combined with a cumulative polygon on the same graph
  • ability to separate the “vital few’ form the “trival many”
  • this enables one to focus on the important categories
  • great when categorical vairable of interest contains many categories
26
Q

what is the pareto diagram used for

A
  • when the cateogrical variable of interest contains many categories
  • use din analyzing process and product quality
27
Q

Definition of Pareto chart

A

A Pareto chart is a bar graph. The lengths of the bars represent frequency or cost (time or money), and are arranged with longest bars on the left and the shortest to the right. In this way the chart visually depicts which situations are more significant.

28
Q

when to use a Pareto Chart

A

When to Use a Pareto Chart

When analyzing data about the frequency of problems or causes in a process.

When there are many problems or causes and you want to focus on the most significant.

When analyzing broad causes by looking at their specific components.

When communicating with others about your data

29
Q

what is the Pareto chart procedure

A

Pareto Chart Procedure

Decide what categories you will use to group items.
Decide what measurement is appropriate. Common measurements are frequency, quantity, cost and time.
Decide what period of time the Pareto chart will cover: One work cycle? One full day? A week?
Collect the data, recording the category each time. (Or assemble data that already exist.)
Subtotal the measurements for each category.
Determine the appropriate scale for the measurements you have collected. The maximum value will be the largest subtotal from step 5. (If you will do optional steps 8 and 9 below, the maximum value will be the sum of all subtotals from step 5.) Mark the scale on the left side of the chart.
Construct and label bars for each category. Place the tallest at the far left, then the next tallest to its right and so on. If there are many categories with small measurements, they can be grouped as “other.”

Steps 8 and 9 are optional but are useful for analysis and communication.

Calculate the percentage for each category: the subtotal for that category divided by the total for all categories. Draw a right vertical axis and label it with percentages. Be sure the two scales match: For example, the left measurement that corresponds to one-half should be exactly opposite 50% on the right scale.
Calculate and draw cumulative sums: Add the subtotals for the first and second categories, and place a dot above the second bar indicating that sum. To that sum add the subtotal for the third category, and place a dot above the third bar for that new sum. Continue the process for all the bars. Connect the dots, starting at the top of the first bar. The last dot should reach 100 percent on the right scale.

30
Q

what are the ways of tabulating and graphing Bivariate Categorical Data?

A
  1. Contingency table
  2. Side-by-side bar chart
31
Q

What is a contingency table

A
  • a two way table of cross classification
32
Q

why would you use a contingency table

A

to simultaneously study the responses to two cateogridal variables

33
Q

give an example of use for a contingency table

A

you might be interested in examining whether there is a pattern or realtionship between a fund’s 5 star rating and its fundscoep risk rating

34
Q

desicribe the side-by-side chart

A

useful way to visually display bivariate categorical data when looking for patterns or relationships

35
Q

what are the basic features of a proper graph

A
  1. Showing the data
  2. Getting the viewer to focus on the substance of the graph, rather than on how the graph was developed
  3. Avoiding distortion
  4. Encouraging comparisons of data
  5. Serving a clear purpose
  6. Being integrated with the statistical and verbal descriptions of the graph
36
Q

What are the principles of graphical excellence

A
  1. Graphical excellence is a well-designed presentation of data that provides substance, statistics and design
  2. Graphical excellence communicates complex ideas with clarity, precision and efficiency
  3. Graphical excellence gives the viewer the largest number of ideas in the shortest time, with the least ink
  4. Graphical excellence almost always involves several dimensions
  5. Graphical excellence requires telling the truth about the data
37
Q

what are the ways of evaluating the excellence of a graph

A
  1. data-ink ratio
  2. lie factor
38
Q

Describe Data-ink ratio

A

is the proprotion of the grapic’s ink that is devoted to non-redundant dispaly of data informaiton

data ink ratio = data ink / total ink used to print the graphic

  • the objective is to maximize the proportion of the ink used in the grap that is devoted to the data
  • with reasonable limits, non-data ink and redundant dta ink should be eliminated
39
Q

what are some examples of non-data ink

A
  • aspects of the graph that do not relate to the substantive features of the data
  • as well as grid lines that may be imposed on the graph
  • chartjunk
40
Q

What is chartjunk

A

decoration that is non-data ink or redundant data ink

  • in its extreme form it represnets self-promotion graphics that focus the viewer on the style of the graph, not the data presented
41
Q

what is the lie factor

A

is the ratio of the size of the effect shown in the graph to the size of the effect in the data

  • not using a graph to distort the data

(ie using wine glasses)

  • the amount of distortion can be measured using the lie factor
42
Q

what are the essential components of statistical graphs

A
  1. Title which answsers:
    a. what
    b. where
    c. when
  2. coordinate axes (label)
  3. Scale divisions - “tick marks”
  4. zero line or other line of reference
  5. Break in vertical scale, when necessary
  6. Scale numbers and legends
  7. Curves and Curve labels
43
Q
A