Module 2 Notes - Organizing & Visualizing variables Flashcards
_______ & ______ Summaries both guide further exploration and facilitate decision making
Tabular & Visual summaries
______ Summaries enable rapid review of larger amounts of data & show possible significant patterns
Visual
Summary table
One categorical variable, tallying data
Contingency Table
Two categorical variables, tallying data
a _______ table tallies the frequencies or percentages of items in a set of categories so that you can see the differences between categories
summary
-Used to study patterns that may exist between the responses of two or more categorical variables
-Cross tabulates or tallies jointly the responses of the categorical variables.
-For two variables the tallies for one variable are located in the rows and the tallies for the second variable are located in the columns
Contingency table
(Q): Of those who went bar hopping before the exam in the sample, what (1) percent of them did well and what (2) percent of them didn’t do well on the midterm
good grades | not good grades | Total
Studied | 80 | 20 | 100
Bar Hopped| 30 | 70 | 100
Total | 110 | 90 | 200
(1) 30% (30/100) did well
(2) 70% (70/100) didn’t do well on the midterm
(Q): Of those who didn’t get good grades in the sample, what (1) percent of them studied hard and what (2) percent of them went bar hopping?
good grades | not good grades | Total
Studied | 80 | 20 | 100
Bar Hopped| 30 | 70 | 100
Total | 110 | 90 | 200
(1) 22% (20/90) studied hard
(2) 78% (70/90) went bar hopping
Tables Used for Organizing Numerical Data
Ordered Array, Frequency Distributions, Cumulative Distributions
An _______ _____ is a sequence of data in rank order, from the smallest value to the largest value.
- Shows range (min value to max value)
- May help identify outliers (unusual observations)
Ordered Array
A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature in degrees Fahrenheit
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 12, 12, 38, 41, 43, 44, 27, 53, 27
What type of numerical data organization is this?
Frequency Distribution
- Sort raw data in ascending order: 12, 13, 17, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
-Find range (58-12=46)
-Select number of classes: 5 (usually between 5 & 15) - Computer class interval (width): 10 (46/5 then round up)
-Determine class boundaries (limits) - Computer class midpoints: 15, 25, 35, 45, 55.
- Count observations & assign to classes.
Frequency Distribution (Cont.)
Relative Frequency
Relative Frequency = Frequency/Total
Cumulative Percentage
Cumulative Frequency / Total * 100
-condenses raw data into a more useful form
-allows for a quick visual interpretation of the data.
- enables the determination of the major characteristics of the data set including where the data are concentrated/clustered.
Reasons to use a frequency distribution
Pie or Doughnut Chart, Bar Chart, Pareto Chart
Summary Table for one variable
Side by side bar chart, Doughnut chart
Contingency table for two variables
the ___ _____ visualizes a categorical variable as a series of bars. The length of each bar represents either the frequency or percentage of values for each category. Each bar is separated by a space called a gap.
Bar chart
The ___ _____ is a circle broken up into slices that represent categories. The size of each slice of the ___ varies according to the percentage in each category (e.g., Market share)
Pie chart, pie
the ________ _____ is the outer part of a circle broken up into pieces that represent categories. The size of each piece of the _______ varies according to the percentage of each category.
Doughnut chart, doughnut
Free Exercise - look up Pareto chart to understand it
Pareto chart moment
the ____ __ ____ _____ represents the data from a contingency table.
side by side bar chart
A ________ _____ can be used to represent the data from a contingency table
doughnut chart
Orderedy Array., Stem-and-leaf Display, Frequency Distributions & Cumulative Distributions, Histogram, Polygon, Ogive
Numerical Data Graphical Displays
A simple way to see how the data are distributed and where concentrations of data exist.
Method: Separate the sorted data series into
leading digits (the ______ and the trailing digits (the ______).
Stem-and-Leaf Display, Stems, Leaves.
the ____-___-____ _______ organizes data into groups (called _____) so that the values within each group (the ______) branch out to the right on each row.
stem-and-leaf display, stems, leaves
Free Exercise - Look up Histogram to understand it (2:09 on Module 2.4)
Histogram moment
a vertical bar chart of the data in a frequency distribution is called a _________.
histogram
in a _________ there are no gaps between adjacent bars.
histogram
the _____ __________ (or _____ _________) are shown on the horizontal axis
class boundaries. class midpoints
the vertical axis is either _________. ________ __________ or _________
frequency, relative frequency, or percentage.
the height of the bars represent the _________. ________ __________, or __________
frequency, relative frequency, or percentage.
A __________ _______ is formed by having the midpoint of each class represent the data in that class and then connecting the sequence of midpoints at their respective class percentages.
percent polygon
the __________ __________ _______, or _____, displays the variable of interests along the X axis and the cumulative percentages along the Y axis.
Cumulative percentage polygon, or ogive
Useful when there are two or more groups to compare.
Percentage polygon
Free Exercise - Look at Percentage Polygon Chart (3:28 on Module 2.4)
Percentage Polygon Moment
Free Exercise - Look at Cumulative Percentage Polygon (Ogive) Chart (3:58 on Module 2.4)
Ogive Moment
Scatter plot, Time-series Plot
Two Numerical Variables
_______ _____ are used for numerical data consisting of paired observations taken from two numerical variables
Scatter plots
One variable’s values are displayed on the horizontal or X axis and the other variable’s values are displayed on the vertical or Y axis
Scatter plot
_______ _____ are used to examine possible relationships between two numerical variables (positive or negative)
Scatter plots
Free Exercise - Look at Scatter Plot Chart EX (5:00 on Module 2.4)
Scatter plot chart moment
A ___-______ ____ is used to study patterns in the values of a numeric variable over time
Time-series plot
Numeric variable’s values are on the vertical axis and the time period is on the horizontal axis
Time-series plot:
Free Exercise - Look at Time Series Plot Chart EX (5:57 on Module 2.4)
Time Series Plot Moment
Free Exercise - Look at Module 2 Excel Application Manual
Module 2 Excel Application Manual Moment
An insurance company evaluates many numerical variables about a person before deciding on an appropriate rate for automobile insurance. A representative from a local insurance agency selected a random sample of insured drivers and recorded, X, the number of claims each made in the last 3 years, with the following results.
X f
1 14
2 18
3 12
4 5
5 1
Referring to this scenario, how many drivers are represented in the sample
50
An insurance company evaluates many numerical variables about a person before deciding on an appropriate rate for automobile insurance. A representative from a local insurance agency selected a random sample of insured drivers and recorded, X, the number of claims each made in the last 3 years, with the following results.
X f
1 14
2 18
3 12
4 5
5 1
Referring to this scenario, how many total claims are represented in the sample?
111
At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer.
X f_
1 7
2 5
3 11
4 8
5 9
Referring to this scenario, how many regional offices are represented in the survey results?
40
At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer.
X f_
1 7
2 5
3 11
4 8
5 9
Referring to this scenario, across all the regional offices, how many total employees were supervised by those surveyed?
127
A professor of economics at a small Texas university wanted to determine what year in school students were taking his tough economics course. Shown below is a pie chart of the results. What percentage of the class took the course prior to reaching their senior year?
(Pie Chart)
Freshman - 10%
Sophomore - 46%
Juniors - 30%
Seniors 14%
86%
A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data is shown below.
Stem Leaves
3 24
4 03478999
5 0112345
6 12566
7 01
8
9 2
Referring to this scenario, what percentage of the respondents rated overall television quality with a rating of 80 or above?
4
A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data is shown below.
Stem Leaves
3 24
4 03478999
5 0112345
6 12566
7 01
8
9 2
Referring to this scenario, what percentage of the respondents rated overall television quality with a rating of 50 or below?
44
The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.
Relative
Time (in Minutes) Frequency
0 but less than 5 0.37
5 but less than 10 0.22
10 but less than 15 0.15
15 but less than 20 0.10
20 but less than 25 0.07
25 but less than 30 0.07
30 or more 0.02
Referring to this scenario, what is the width of each class?
5 Minutes
The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.
Relative
Time (in Minutes) Frequency
0 but less than 5 0.37
5 but less than 10 0.22
10 but less than 15 0.15
15 but less than 20 0.10
20 but less than 25 0.07
25 but less than 30 0.07
30 or more 0.02
Referring to this scenario, if 100 calls were randomly sampled, how many calls lasted 15 minutes or longer?
26
The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.
Relative
Time (in Minutes) Frequency
0 but less than 5 0.37
5 but less than 10 0.22
10 but less than 15 0.15
15 but less than 20 0.10
20 but less than 25 0.07
25 but less than 30 0.07
30 or more 0.02
Referring to this scenario, if 100 calls were sampled, _______ of them would have lasted less than 5 minutes or at least 30 minutes or more.
39
When studying the simultaneous responses to two categorical questions, you should set up a
contingency table
A survey of 150 executives were asked what they think is the most common mistake candidates make during job interviews. Six different mistakes were given. What is best for presenting the information?
A bar chart
You have collected information on the market share of 5 different search engines used by U.S. Internet users in a quarter. What is the best for presenting the information?
A pie chart
Which of the following is appropriate for displaying data collected on the different brands of cars students at a major university drive?
A Pareto chart
You have collected data on the number of complaints for 6 different brands of automobiles sold in the US over a 10-year period. What is best for presenting the data?
A side-by-side bar chart
Data on 1,500 students’ height were collected at a larger university in the East Coast. What is best chart for presenting the information?
A histogram
Data on the number of part-time hours students at a public university worked in a week were collected. What is the best chart for presenting the information?
A percentage polygon
You have collected data on the approximate retail price (in $) and the energy cost per year (in $) of 15 refrigerators. What is the best for presenting the data?
A scatter plot
One of the developing countries is experiencing a baby boom, with the number of births rising for the fifth year in a row, according to a BBC News report. What is best for displaying this data?
a time-series plot
A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.
Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70
Referring to this scenario, of those who went bar hopping the weekend before the midterm in the sample, _______ percent of them did well on the midterm.
30
A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.
Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70
Referring to this scenario, if the sample is a good representation of the population, we can expect _______ percent of those who spent the weekend studying to do poorly on the midterm.
20
A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.
Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70
Referring to this scenario, if the sample is a good representation of the population, we can expect _______ percent of those who did poorly on the midterm to have spent the weekend studying.
22