Module 1 Flashcards

Question

mutually exclusive

Answer 1

t the category definitions cause each data value to | be placed in one and only one category, a

Answer 2

Also ensure that the set of categories you create for the new, recoded variables include all the data values being recoded, a property known as being collectively exhaustive. If you are recoding a categorical variable, you can preserve one or more of the original categories, as long as your recodings are both mutually exclusive and collectively exhaustive.

Answer 3

The frame is a complete or partial listing of the items that make up the population from which the sample will be selected.

Answer 4

In | a nonprobability sample, you select the items or individuals without knowing their probabilities of selection.

Answer 5

in a probability sample, you select items based on known probabilities. Whenever possible, you should use a probability sample as such a sample will allow you to make inferences about the population being analyzed.

Answer 6

you select items that are easy, inexpensive, or convenient to sample. For example, in a warehouse of stacked items, selecting only the items located on the tops of each stack and within 1.4 Types of Sampling Methods M01_BERE9029_13_SE_C01.indd 49 19/09/14 8:37 AM 50 Chapter 1 Defining and Collecting Data easy reach would create a convenience sample. So, too, would be the responses to surveys that the websites of many companies offer visitors. While such surveys can provide large amounts of data quickly and inexpensively, the convenience samples selected from these responses will consist of self-selected website visitors. (Read the Think About This essay on page 54 for a related story.)

Answer 7

you collect the opinions of preselected experts in the subject matter. Although the experts may be well informed, you cannot generalize their results to the population.

Answer 8

every item from a frame has the same chance of selection as every other item, and every sample of a fixed size has the same chance of selection as every other sample of that size. Simple random sampling is the most elementary random sampling technique. It forms the basis for the other random sampling techniques. However, simple random sampling has its disadvantages. Its results are often subject to more variation than other sampling methods. In addition, when the frame used is very large, carrying out a simple random sample may be time consuming and expensive

Answer 9

means that after you select an item, you return it to the frame, where it has the same probability of being selected again. Imagine that you have a fishbowl containing N business cards, one card for each person. On the first selection, you select the card for Grace Kim. You record pertinent information and replace the business card in the bowl. You then mix up the cards in the bowl and select a second card. On the second selection, Grace Kim has the same probability of being selected again, 1>N. You repeat this process until you have selected the desired sample size,

Answer 10

means that once you select an item, you cannot select it again. The chance that you will select any particular item in the frame—for example, the business card for Grace Kim—on the first selection is 1>N. The chance that you will select any card not previously chosen on the second selection is now

Answer 11

you partition the N items in the frame into n groups of k items, where k = N/n You round k to the nearest integer. To select a systematic sample, you choose the first item to be selected at random from the first k items in the frame. Then, you select the remaining n - 1 items by taking every kth item thereafter from the entire frame.

Answer 12

In a stratified sample, you first subdivide the N items in the frame into separate subpopulations, or strata.

Answer 13

In a cluster sample, you divide the N items in the frame into clusters that contain several items.

Answer 14

are often naturally occurring groups, such as counties, election districts, city blocks, households, or sales territories. You then take a random sample of one or more clusters and study all items in each selected cluster.

Answer 15

Coverage error occurs if certain groups of items are excluded from the frame so that they have no chance of being selected in the sample or if items are included from outside the frame. Coverage error results in a selection bias.

Answer 16

Coverage error occurs if certain groups of items are excluded from the frame so that they have no chance of being selected in the sample or if items are included from outside the frame. Coverage error results in a selection bias

Answer 17

Not everyone is willing to respond to a survey. Nonresponse error arises from failure to collect data on all items in the sample and results in a nonresponse bias.

Answer 18

y. Nonresponse error arises from failure to collect data on all items in the sample and results in a nonresponse bias.

Answer 19

r reflects the variation, or “chance differences,” from sample to sample, based on the probability of particular individuals or items being selected in the particular samples.

Answer 20

llies the values as frequencies or percentages for each category. A summary table helps you see the differences among the categories by displaying the frequency, amount, or percentage of items in a set of categories in a separate colu

Answer 21

cross-tabulates, or tallies jointly, the values of two or more categorical variables, allowing you to study patterns that may exist between the variables. Tallies can be shown as a frequency, a percentage of the overall total, a percentage of the row total, or a percentage of the column total, depending on the type of contingency table you use. Each tally appears in its own cell, and there is a cell for each joint response,

Answer 22

An ordered array arranges the values of a numerical variable in rank order, from the smallest value to the largest value.

Answer 23

``` A frequency distribution tallies the values of a numerical variable into a set of numerically ordered classes Each class groups a mutually exclusive range of values, called a class interval. ```

Answer 24

. Each class groups a mutually exclusive range of values, called a class interval.

Answer 25

intervale width = (highest value-lowest value)/number of classes = 55/10 = 5.5

Answer 26

``` Because each value can appear in only one class, you must establish proper and clearly defined class boundaries for each class. For example, if you chose $10 as the class interval for the restaurant data, you would need to establish boundaries that would include all the values and simplify the reading and interpretation of the frequency distribution. Because the cost of a city restaurant meal varies from $25 to $80, establishing the first class interval as $20 to less than $30, the second as $30 to less than $40, and so on, until the last class interval is $80 to less than $90, would meet the requirements. Table 2.9 contains frequency distributions of the cost per meal for the 50 city restaurants and the 50 suburban restaurants using these class intervals. ```

Answer 27

``` For some charts discussed later in this chapter, class intervals are identified by their class midpoints, the values that are halfway between the lower and upper boundaries of each class. For the frequency distributions shown in Table 2.9, the class midpoints are $25, $35, $45, $55, $65, $75, $85, and $95. Note that well-chosen class intervals lead to class midpoints that are simple to read and interpret, as in this example ```

Answer 28

``` . A relative frequency distribution presents the relative frequency, or proportion, of the total for each group that each class represents. ```

Answer 29

A percentage distribution presents the percentage of the total for each group that each class represents. When you compare two or more groups, knowing the proportion (or percentage) of the total for each group is more useful than knowing the frequency for each group, as Table 2.11 demonstrates. Compare this table to Table 2.9 on page 71, which displays frequencies. Table 2.11 organizes the meal cost data in a manner that facilitates comparisons.

Answer 30

``` , in each group is equal to the number of values in each class divided by the total number of values. ```

Answer 31

``` Proportion = relative frequency = number of values in each class total number of values ``` ``` If there are 80 values and the frequency in a certain class is 20, the proportion of values in that class is 20 80 = 0.25 and the percentage is 0.25 * 100% = 25% ```

Answer 32

The cumulative percentage distribution provides a way of presenting information about the percentage of values that are less than a specific amount. You use a percentage distribution as the basis to construct a cumulative percentage distribution.

Answer 33

In an unstacked format, you create separate numerical variables for each group. For example, if you entered the meal cost data used in the examples in this section in unstacked format, you would create two numerical variables—city meal cost and suburban meal cost—enter the top data in Table 2.8A on page 70 as the city meal cost data, and enter the bottom data in Table 2.8A as the suburban meal cost data. In a stacked format, you pair a numerical variable that contains all of the values with a second, separate categorical variable that contains values that identify to which group each numerical value belongs. For example, if you entered the meal cost data used in the examples in this section in stacked format, you would create a meal cost numerical variable to hold the 100 meal cost values shown in Table 2.8A and create a second location (categorical) variable that would take the value “City” or “Suburban,” depending upon whether a particular value came from a city or suburban restaurant (the top half or bottom half of Table 2.8A).

Answer 34

In a Pareto chart, the tallies for each category are plotted as vertical bars in descending order, according to their frequencies, and are combined with a cumulative percentage line on the same chart. Pareto charts get their name from the Pareto principle, the observation that in many data sets, a few categories of a categorical variable represent the majority of the data, while many other categories represent a relatively small, or trivial, amount of the data. Pareto charts help you to visually identify the “vital few” categories from the “

Answer 35

A stem-and-leaf display visualizes data by presenting the data as one or more row-wise stems that represent a range of values. In turn, each stem has one or more leaves that branch out to the right of their stem and represent the values found in that stem. For stems with more than one leaf, the leaves are arranged in ascending order.

Answer 36

``` The Histogram A histogram visualizes data as a vertical bar chart in which each bar represents a class interval from a frequency or percentage distribution. In a histogram, you display the numerical variable along the horizontal (X) axis and use the vertic ```

Answer 37

``` When using a categorical variable to divide the data of a numerical variable into two or more groups, you visualize data by constructing a percentage polygon. This chart uses the midpoints of each class interval to represent the data of each class and then plots the midpoints, at their respective class percentages, as points on a line along the X axis. W ```

Answer 38

``` e cumulative percentage polygon, or ogive, uses the cumulative percentage distribution discussed in Section 2.2 to plot the cumulative percentages along the Y axis. Unlike the percentage polygon, the lower boundary of the class interval for the numerical variable are plotted, at their respective class percentages, as points on a line along the X axis ```

Answer 39

res the possible relationship between two numerical variables by plotting the values of one numerical variable on the horizontal, or X, axis and the values of a second numerical variable on the vertical, or Y, axis.

Answer 40

A time-series plot plots the values of a numerical variable on the Y axis and plots the time period associated with each numerical value on the X axis. A time-series plot can help you visualize trends in data that occur over tim

Answer 41

Both Excel and Minitab can organize many variables at the same time, but the two programs have different strengths. Using Excel, you can create a PivotTable,

Answer 42

Obscuring Data Creating False Impressions Chartjunk

Answer 43

Central tendency is the extent to which the values of a numerical variable group around a typical, or central, value. Variation measures the amount of dispersion, or scattering, away from a central value that the values of a numerical variable show. The shape of a variable is the pattern of the distribution of values from the lowest value to the highest value

Answer 44

s the most common measure of central tendency. The mean can suggest a typical or central value and serves as a “balance point” in a set of data, similar to the fulcrum on a seesaw. The mean is the only common measure in which all the values play an equal role. You compute the mean by adding together all the values and then dividing that sum by the number of values in the data set.

Answer 45

The sample mean is the sum of the values in a sample divided by the number of values in the sample:

Answer 46

The median is the middle value in an ordered array of data that has been ranked from smallest to largest. Half the values are smaller tha

Answer 47

When you want to measure the rate of change of a variable over time, you need to use the geometric mean instead of the arithmetic mean

Answer 48

Variation measures the spread, or dispersion, o

Answer 49

difference between smallest and largest values

Answer 50

. These statistics measure the “average” scatter around the mean—how larger values fluctuate above it and how smaller values fluctuate below it.

Answer 51

A simple measure of variation around the mean might take the difference between each value and the mean and then sum these differences. However, if you did that, you would find that these differences sum to zero because the mean is the balance point for every numerical variable.

Answer 52

The sample variance (S2 ) is the sum of squares divided by the sample size minus 1.

Answer 53

sample standard deviation (S) is the square root of the sample variance

Answer 54

Coefficient of Variation The coefficient of variation is equal to the standard deviation divided by the mean, multiplied by 100%.

Answer 55

Z Scores The Z score of a value is the difference between that value and the mean, divided by the standard deviation. A Z score of 0 indicates that the value is the same as the mean. If a Z score is a positive or negative number, it indicates whether the value is above or below the mean and by how many standard deviations.

Answer 56

Kurtosis measures the extent to which values that are very different from the mean affect the shape of the distribution of a set of data

Answer 57

A distribution that has a sharper-rising center peak than the peak of a normal distribution has positive kurtosis, a kurtosis value that is greater than zero, and is called lepokurtic. A distribution that has a slowerrising (flatter) center peak than the peak of a normal distribution has negative kurtosis, a kurtosis value that is less than zero, and is called platykurtic.

Answer 58

Quartiles Quartiles split the values into four equal parts—the first quartile 1Q1 2 divides the smallest 25.0% of the values from the other 75.0% that are larger. The second quartile 1Q2 2 is the median; 50.0% of the values are smaller than or equal to the median, and 50.0% are larger than or equal to the median. The third quartile 1Q3 2 divides the smallest 75.0% of the values from the largest 25.0%. Equations (3.10) and (3.11) define the first and third quartiles

Answer 59

The Interquartile Range The interquartile range (also called the midspread) measures the difference in the center of a distribution between the third and first quartiles.

Answer 60

he Five-Number Summary The five-number summary for a variable consists of the smallest value 1Xsmallest 2, the first quartile, the median, the third quartile, and the largest value 1Xlargest 2.

Answer 61

The Boxplot | The boxplot uses a five-number summary to visualize the shape of the distribution for a va

Answer 62

Descriptive is of a sample, inferential relates to a hypothesis

Module 1 Flashcards

(86 cards)