Module 4--Statistics: Collecting, Organizing, Grouping and Displaying Data Flashcards

1
Q

Population vs. Sample Data

A

■ Population
* A population consists of all the values in the set of data to be analyzed.
* At work, you usually will know if you are working with a population. In other situations, you
have to be given some key words to make such a determination, such as “all,” “every” or “this
is a population.”

■ Sample
* A sample is a subset of a population which can be selected in a number of ways.

■ Value of samples
* Sample statistics are used to estimate population parameters.
* We use sample statistics to make inferences about populations.
* Since we can obtain many samples from a population, the key to sampling is to obtain a
representative sample.
* The sample must reflect the population if it is to be of value.

■ There are many, many samples that may be obtained from a population.
■ Our task is to obtain a sample that is representative of the population.
■ We usually will not know how well our sample represents the population until well after the fact.
The only way we might know this in advance is to have the population data available.
■ If we have access to the population data, there is little reason to sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two Sampling Procedures

A

Sampling Procedures

■ Random sampling – Chance determines which items are selected.

■ Non-random sampling – The data collector deliberately selects the items to be studied.
* Convenience sampling – The data collector selects on the basis of convenience or availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Creating Frequency Distributions: Getting Started

A

■ What is the variable of interest?
* Age – These are the ages of all employees in a small business. Such demographic information
can be collected to tailor total rewards programs for their employee population.

■ What is the level of measurement for your variable of interest?
* Ratio – We can say whether the employees are the same or different ages (at least nominal);
we can place them in rank order by their ages (at least ordinal); the difference between 35 and
45 is the same as the difference between 47 and 57 (at least interval); and there is a true zero
for age (ratio).

■ Do these data represent a sample or a population?
* Population – We are only interested in these 50 employees. It would have been a sample if this
were just a subset of a larger data base.

■ Are these data suitable for grouping into intervals (i.e., categories, groups, etc.)?
* Yes – In its current format, the data represent chaos. We want to create order out of chaos.

■ Are there any other variables available that can be used to better understand the variable of
interest?
* Not at the present time – We only know their ages. We do not know their job types, their
performance scores, or any other variables about them at present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to Group Data?

A

■ Group the data
* Intervals – Group the data into intervals or categories in order to better visualize the
distribution and look for patterns or trends.
– Determine number of intervals
– Determine interval widths
– Create intervals
– Distribute raw data into intervals
– Create a picture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Grouping Data

A

Grouping Data into Intervals
■ 5 to 20 intervals – It is not often that one would use fewer than 5 or more than 20 intervals (some
sources suggest 5 to 15 intervals). The number of intervals is an arbitrary decision.
■ Constant width – Make sure each interval is the same width (i.e., a constant width for each
interval).
■ No overlaps – Make sure each data point can go into one and only one interval (i.e., each interval is
mutually exclusive, with no overlaps).
■ Accommodate all data – Make sure the set of intervals will accommodate all the data (i.e.,
collectively exhaustive).
■ Consider odd widths – Consider using odd widths for intervals when appropriate, in order to
ensure that the midpoints of the intervals will be integers (whole numbers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Frequency Distribution Based on Intervals of Grouped Data

A

■ Absolute frequency – Enter the number of data points that are within each interval. There are four
data points that are in each interval 20-24.
■ Relative frequency – Calculate the relative frequency by dividing the absolute frequency by the
total number. For each interval 20-24, the relative frequency is 4 divided by 50 or 8%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can a bar graph do for Grouped Age Data

A

Use a frequency distribution table to create a picture and arrange the data for analysis. A bar graph is
a good example of the type of picture that can be created (as shown above).

This process began with random data and now it is beginning to tell us something. When data are
organized, they start to become information.

In this example, the graph shows absolute frequencies instead of relative frequency percents because
we are not comparing this distribution of ages against any other data set. If we were to compare this
distribution of ages against some other data set, such as across departments, across geographic
locations, or across industries, we would most likely want to convert these absolute frequencies to
relative frequency percents, since it is unlikely that a comparison data set would have the exact same
number of people as this data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you Determine the Widths of Intervals?

A

Determining the Widths of Intervals
■ Find the range – Find the difference between the highest and lowest values (i.e., find the range).
64 – 18 = 46

■ Divide the range by 5 and by 20
46 ÷ 20 = 2.3
46 ÷ 5 = 9.2
For this distribution of age data, interval widths between approximately 3 and 9 will yield between 5
and 20 intervals.

■ Make guided arbitrary decision – We will make the guided arbitrary decision of using interval
widths of 5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two Frequency Distribution Based on Intervals of Grouped Data

A

■ Absolute frequency – Enter the number of data points that are within each interval. There are four
data points that are in each interval 20-24.

■ Relative frequency – Calculate the relative frequency by dividing the absolute frequency by the
total number. For each interval 20-24, the relative frequency is 4 divided by 50 or 8%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is good about a bar graph?

A

A bar graph can help visualize
a frequency distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Testing the Model with a Case Study:
Madrid Data

Territory: Madrid

Employees: 20 field representatives of a large
organization sampled from this
particular geographic region

Variables of Interest: Bonus points

A

Testing the Model with a Case Study: Madrid Data
■ What is the variable of interest?
* Bonus points

■ What is the level of measurement for your variable of interest?
* Ratio – This depends on how bonus points are administered. If we can assume field
representatives receive one bonus point for each new customer acquired, then we have
satisfied ratio level scale. This is why identifying and understanding our metrics is important.

■ Do these data represent a sample or a population?
* Sample – This is given in the story problem.

■ Are these data suitable for grouping into intervals (i.e., categories, groups, etc.)?
* Yes – We will review the data to make this determination.

■ Are there any other variables available that can be used to better understand the variable of
interest?
* Not at the present time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. You have been given the task of examining the effect of a pay-for-performance project on employee productivity. You have obtained a productivity rating score for all program participants.

Which of the following best describes your data?
A. The data represent a sample
B. The data represent a population
C. Productivity rating scores only produce nominal data
D. Productivity rating scores always produce ratio data

A

B. The data represent a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. Why do we sample?
    A. Because samples are more accurate than populations
    B. Because of the lack of time and money
    C. Because computers only work with sample data
    D. Because population data are unreliable
A

B. Because of the lack of time and money

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. Which of the following provide useful guidelines for grouping data into categories?
    A. Use a constant width for all categories
    B. Use mutually exclusive categories
    C. Do not allow categories to overlap
    D. All of the above
A

D. All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly