Module 2 Notes - Organizing & Visualizing variables Flashcards

1
Q

_______ & ______ Summaries both guide further exploration and facilitate decision making

A

Tabular & Visual summaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

______ Summaries enable rapid review of larger amounts of data & show possible significant patterns

A

Visual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Summary table

A

One categorical variable, tallying data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Contingency Table

A

Two categorical variables, tallying data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a _______ table tallies the frequencies or percentages of items in a set of categories so that you can see the differences between categories

A

summary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

-Used to study patterns that may exist between the responses of two or more categorical variables
-Cross tabulates or tallies jointly the responses of the categorical variables.
-For two variables the tallies for one variable are located in the rows and the tallies for the second variable are located in the columns

A

Contingency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(Q): Of those who went bar hopping before the exam in the sample, what (1) percent of them did well and what (2) percent of them didn’t do well on the midterm

good grades | not good grades | Total
Studied | 80 | 20 | 100

Bar Hopped| 30 | 70 | 100

Total | 110 | 90 | 200

A

(1) 30% (30/100) did well
(2) 70% (70/100) didn’t do well on the midterm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(Q): Of those who didn’t get good grades in the sample, what (1) percent of them studied hard and what (2) percent of them went bar hopping?

good grades | not good grades | Total
Studied | 80 | 20 | 100

Bar Hopped| 30 | 70 | 100

Total | 110 | 90 | 200

A

(1) 22% (20/90) studied hard
(2) 78% (70/90) went bar hopping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tables Used for Organizing Numerical Data

A

Ordered Array, Frequency Distributions, Cumulative Distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An _______ _____ is a sequence of data in rank order, from the smallest value to the largest value.
- Shows range (min value to max value)
- May help identify outliers (unusual observations)

A

Ordered Array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature in degrees Fahrenheit

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 12, 12, 38, 41, 43, 44, 27, 53, 27

What type of numerical data organization is this?

A

Frequency Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • Sort raw data in ascending order: 12, 13, 17, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
    -Find range (58-12=46)
    -Select number of classes: 5 (usually between 5 & 15)
  • Computer class interval (width): 10 (46/5 then round up)
    -Determine class boundaries (limits)
  • Computer class midpoints: 15, 25, 35, 45, 55.
  • Count observations & assign to classes.
A

Frequency Distribution (Cont.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relative Frequency

A

Relative Frequency = Frequency/Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cumulative Percentage

A

Cumulative Frequency / Total * 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

-condenses raw data into a more useful form
-allows for a quick visual interpretation of the data.
- enables the determination of the major characteristics of the data set including where the data are concentrated/clustered.

A

Reasons to use a frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Pie or Doughnut Chart, Bar Chart, Pareto Chart

A

Summary Table for one variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Side by side bar chart, Doughnut chart

A

Contingency table for two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

the ___ _____ visualizes a categorical variable as a series of bars. The length of each bar represents either the frequency or percentage of values for each category. Each bar is separated by a space called a gap.

A

Bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The ___ _____ is a circle broken up into slices that represent categories. The size of each slice of the ___ varies according to the percentage in each category (e.g., Market share)

A

Pie chart, pie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the ________ _____ is the outer part of a circle broken up into pieces that represent categories. The size of each piece of the _______ varies according to the percentage of each category.

A

Doughnut chart, doughnut

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Free Exercise - look up Pareto chart to understand it

A

Pareto chart moment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the ____ __ ____ _____ represents the data from a contingency table.

A

side by side bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A ________ _____ can be used to represent the data from a contingency table

A

doughnut chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Orderedy Array., Stem-and-leaf Display, Frequency Distributions & Cumulative Distributions, Histogram, Polygon, Ogive

A

Numerical Data Graphical Displays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A simple way to see how the data are distributed and where concentrations of data exist.

Method: Separate the sorted data series into
leading digits (the ______ and the trailing digits (the ______).

A

Stem-and-Leaf Display, Stems, Leaves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

the ____-___-____ _______ organizes data into groups (called _____) so that the values within each group (the ______) branch out to the right on each row.

A

stem-and-leaf display, stems, leaves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Free Exercise - Look up Histogram to understand it (2:09 on Module 2.4)

A

Histogram moment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

a vertical bar chart of the data in a frequency distribution is called a _________.

29
Q

in a _________ there are no gaps between adjacent bars.

30
Q

the _____ __________ (or _____ _________) are shown on the horizontal axis

A

class boundaries. class midpoints

31
Q

the vertical axis is either _________. ________ __________ or _________

A

frequency, relative frequency, or percentage.

32
Q

the height of the bars represent the _________. ________ __________, or __________

A

frequency, relative frequency, or percentage.

33
Q

A __________ _______ is formed by having the midpoint of each class represent the data in that class and then connecting the sequence of midpoints at their respective class percentages.

A

percent polygon

34
Q

the __________ __________ _______, or _____, displays the variable of interests along the X axis and the cumulative percentages along the Y axis.

A

Cumulative percentage polygon, or ogive

35
Q

Useful when there are two or more groups to compare.

A

Percentage polygon

36
Q

Free Exercise - Look at Percentage Polygon Chart (3:28 on Module 2.4)

A

Percentage Polygon Moment

37
Q

Free Exercise - Look at Cumulative Percentage Polygon (Ogive) Chart (3:58 on Module 2.4)

A

Ogive Moment

38
Q

Scatter plot, Time-series Plot

A

Two Numerical Variables

39
Q

_______ _____ are used for numerical data consisting of paired observations taken from two numerical variables

A

Scatter plots

40
Q

One variable’s values are displayed on the horizontal or X axis and the other variable’s values are displayed on the vertical or Y axis

A

Scatter plot

41
Q

_______ _____ are used to examine possible relationships between two numerical variables (positive or negative)

A

Scatter plots

42
Q

Free Exercise - Look at Scatter Plot Chart EX (5:00 on Module 2.4)

A

Scatter plot chart moment

43
Q

A ___-______ ____ is used to study patterns in the values of a numeric variable over time

A

Time-series plot

44
Q

Numeric variable’s values are on the vertical axis and the time period is on the horizontal axis

A

Time-series plot:

45
Q

Free Exercise - Look at Time Series Plot Chart EX (5:57 on Module 2.4)

A

Time Series Plot Moment

46
Q

Free Exercise - Look at Module 2 Excel Application Manual

A

Module 2 Excel Application Manual Moment

47
Q

An insurance company evaluates many numerical variables about a person before deciding on an appropriate rate for automobile insurance. A representative from a local insurance agency selected a random sample of insured drivers and recorded, X, the number of claims each made in the last 3 years, with the following results.

X f

1 14

2 18

3 12

4 5

5 1

Referring to this scenario, how many drivers are represented in the sample

48
Q

An insurance company evaluates many numerical variables about a person before deciding on an appropriate rate for automobile insurance. A representative from a local insurance agency selected a random sample of insured drivers and recorded, X, the number of claims each made in the last 3 years, with the following results.

X f

1 14

2 18

3 12

4 5

5 1

Referring to this scenario, how many total claims are represented in the sample?

49
Q

At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer.

X f_

1 7

2 5

3 11

4 8

5 9

Referring to this scenario, how many regional offices are represented in the survey results?

50
Q

At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer.

X f_

1 7

2 5

3 11

4 8

5 9

Referring to this scenario, across all the regional offices, how many total employees were supervised by those surveyed?

51
Q

A professor of economics at a small Texas university wanted to determine what year in school students were taking his tough economics course. Shown below is a pie chart of the results. What percentage of the class took the course prior to reaching their senior year?

(Pie Chart)
Freshman - 10%
Sophomore - 46%
Juniors - 30%
Seniors 14%

52
Q

A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data is shown below.

Stem Leaves

3 24

4 03478999

5 0112345

6 12566

7 01

8

9 2

Referring to this scenario, what percentage of the respondents rated overall television quality with a rating of 80 or above?

53
Q

A survey was conducted to determine how people rated the quality of programming available on television. Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality). The stem-and-leaf display of the data is shown below.

Stem Leaves

3 24

4 03478999

5 0112345

6 12566

7 01

8

9 2

Referring to this scenario, what percentage of the respondents rated overall television quality with a rating of 50 or below?

54
Q

The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.

                                          Relative

Time (in Minutes) Frequency

0 but less than 5 0.37

5 but less than 10 0.22

10 but less than 15 0.15

15 but less than 20 0.10

20 but less than 25 0.07

25 but less than 30 0.07

30 or more 0.02

Referring to this scenario, what is the width of each class?

55
Q

The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.

                                          Relative

Time (in Minutes) Frequency

0 but less than 5 0.37

5 but less than 10 0.22

10 but less than 15 0.15

15 but less than 20 0.10

20 but less than 25 0.07

25 but less than 30 0.07

30 or more 0.02

Referring to this scenario, if 100 calls were randomly sampled, how many calls lasted 15 minutes or longer?

56
Q

The following are the duration in minutes of a sample of long-distance phone calls made within the continental United States reported by one long-distance carrier.

                                          Relative

Time (in Minutes) Frequency

0 but less than 5 0.37

5 but less than 10 0.22

10 but less than 15 0.15

15 but less than 20 0.10

20 but less than 25 0.07

25 but less than 30 0.07

30 or more 0.02

Referring to this scenario, if 100 calls were sampled, _______ of them would have lasted less than 5 minutes or at least 30 minutes or more.

57
Q

When studying the simultaneous responses to two categorical questions, you should set up a

A

contingency table

58
Q

A survey of 150 executives were asked what they think is the most common mistake candidates make during job interviews. Six different mistakes were given. What is best for presenting the information?

A

A bar chart

59
Q

You have collected information on the market share of 5 different search engines used by U.S. Internet users in a quarter. What is the best for presenting the information?

A

A pie chart

60
Q

Which of the following is appropriate for displaying data collected on the different brands of cars students at a major university drive?

A

A Pareto chart

61
Q

You have collected data on the number of complaints for 6 different brands of automobiles sold in the US over a 10-year period. What is best for presenting the data?

A

A side-by-side bar chart

62
Q

Data on 1,500 students’ height were collected at a larger university in the East Coast. What is best chart for presenting the information?

A

A histogram

63
Q

Data on the number of part-time hours students at a public university worked in a week were collected. What is the best chart for presenting the information?

A

A percentage polygon

64
Q

You have collected data on the approximate retail price (in $) and the energy cost per year (in $) of 15 refrigerators. What is the best for presenting the data?

A

A scatter plot

65
Q

One of the developing countries is experiencing a baby boom, with the number of births rising for the fifth year in a row, according to a BBC News report. What is best for displaying this data?

A

a time-series plot

66
Q

A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.

Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70

Referring to this scenario, of those who went bar hopping the weekend before the midterm in the sample, _______ percent of them did well on the midterm.

67
Q

A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.

Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70

Referring to this scenario, if the sample is a good representation of the population, we can expect _______ percent of those who spent the weekend studying to do poorly on the midterm.

68
Q

A sample of 200 students at a Big-Ten university was taken after the midterm to ask them whether they went bar hopping the weekend before the midterm or spent the weekend studying, and whether they did well or poorly on the midterm. The following table contains the result.

Did Well | Did Poorly
Studied | 80 | 20
Bar Hopped | 30 | 70

Referring to this scenario, if the sample is a good representation of the population, we can expect _______ percent of those who did poorly on the midterm to have spent the weekend studying.