Module 1 - Describing and Summarizing Data Flashcards

Question

Suppose you actually want to calculate the mean annual healthcare expenditures of the 192 countries. Which of the following Excel functions calculates the mean? SELECT ALL THAT APPLY. * =MEAN(B2:B193) * =AVERAGE(B2:B193) * =MEDIAN(B2:B193) * =SUM(B2:B193)/192 * =MODE.SNGL(B2:B193)

Answer 1

=MEAN(B2:B193) =MEAN(B2:B193) is not a function in Excel. =AVERAGE(B2:B193) =AVERAGE(B2:B193) calculates the mean of the annual healthcare expenditures. Note that another option is also correct. =MEDIAN(B2:B193) =MEDIAN(B2:B193) finds the median, or middle value, of the annual healthcare expenditures. **=SUM(B2:B193)/192** **=SUM(B2:B193)/192 calculates the sum of the annual healthcare expenditures and divides that sum by 192, the number of data points. This formula calculates the mean of the annual healthcare expenditures. Note that another option is also correct** =MODE.SNGL(B2:B193) .=MODE.SNGL(B2:B193) finds the mode, or most common value, of the annual healthcare expenditures.

Answer 2

STDEV.S (B2:B101)= approximately $67.32 billion. You can also use the descriptive statistics tool, making sure to link directly to values in order to obtain the correct answer.

Answer 3

The standard deviation would remain the same. See correct answer for explanation. The standard deviation would increase. See correct answer for explanation. **The standard deviation would decrease.** **The standard deviation gives more weight to observations that are further from the mean. Therefore, removing the outliers would decrease the standard deviation.** The answer cannot be determined without further information. See correct answer for explanation.

Answer 4

From the Data menu, select Data Analysis, then select Histogram. The Input Range is B1:B101 and the Bin Range is C1:C15. You must check the Labels in first row box to ensure that the histogram’s axes are appropriately labeled.

Answer 5

Delete both data points because they are outliers. The consultant should not delete data points simply because they are outliers. Leave both data points in because one should never delete research-based data. The consultant cannot assume that research-based data sets are without fault. There may be situations where data should be deleted: because of measurement or entry error; because the data are not representative of the population of interest; or any of many other reasons. **Research the data points and then make a decision based on the findings.** **The consultant should delete or change data points only if careful examination of the data and the data sources indicates that the data points are incorrect or irrelevant to the research at hand. The consultant must use his or her experience and knowledge of the research question to make decisions on a case-by-case basis. Doing business analytics effectively requires judgment. In this case, the National Museum of American History underwent renovations which reduced significantly the number of visits to the museum in 2007 and 2008. The data points for 2007 and 2008 are correct and should not be changed. However, the fact that the museum was closed during most of that two year period should be considered when drawing conclusions from this data set.** Change the data point for 2008 to 4,800,000 and research the data point for 2007. Although data entry errors may occur, the consultant cannot know this without researching the data points first. In this case, the National Museum of American History underwent renovations which reduced significantly the number of visits to the museum in 2007 and 2008. The data points for 2007 and 2008 are correct and should not be changed. However, the fact that the museum was closed during most of that two year period should be considered when drawing conclusions from this data set.

Answer 6

AVERAGE(A2:A100)=0.98 and STDEV.S(A2:A100)=0.42. You can also use the descriptive statistics tool, making sure to link directly to values in order to obtain the correct answer.

Answer 7

Option A The range of this histogram is approximately 7–0=7. This is the smallest range in this set of histograms.

Answer 8

This is a conditional mean, so you can either use AVERAGEIF(B2:B61,”Boys”,A2:A61)=4.48 and AVERAGEIF(B2:B61,”Girls”,A2:A61)=5.55 or AVERAGEIF(B2:B61,D2,A2:A61)=4.48 and AVERAGEIF(B2:B61,D3,A2:A61)=5.55. You could also just sort the data by gender and compute the averages of each gender, but we want you to learn how to do conditional averages in Excel. As always, it is important that you link to the cells with the data.

Answer 9

Use the descriptive statistics tool to calculate all of the summary statistics. The Input Range is B1:B186. You must check the Labels in first row box to ensure that the output table is appropriately labeled. You must select Summary Statistics in order to produce the output table.

Answer 10

Mean/Standard Deviation This is the inverse of the formula for the coefficient of variation. **Standard Deviation/Mean** **This is the formula for the coefficient of variation, the best statistic to compute to compare the variability of two data sets with different distributions. Dividing by the mean provides a measure of the distribution’s variation relative to the mean.** Mean-Median Although the difference between the mean and the median may provide information about whether a dataset is skewed, it does not provide useful information for comparing variability across different distributions. Median-Mean Although the difference between the mean and the median may provide information about whether a dataset is skewed, it does not provide useful information for comparing variability across different distributions. Mean/Variance The mean and variance are measured in different units. For example, if the mean is measured in feet, the variance is measured in square feet. The coefficient of variation is calculated using the mean and standard deviation, both of which have the same units. Variance/Mean The mean and variance are measured in different units. For example, if the mean is measured in feet, the variance is measured in square feet. The coefficient of variation is calculated using the mean and standard deviation, both of which have the same units.

Answer 11

Coefficient of Variation = Standard Deviation/Mean. Entering =E6/E2 calculates the coefficient of variation, which is approximately 0.03. You must link directly to values in order to obtain the correct answer.

Answer 12

This is a conditional mean, so you can either use AVERAGEIF(B2:B126,”Technology”,C2:C126) or AVERAGEIF(B2:B126,E2,C2:C126). The average number of employees at technology companies in this data set is approximately 7,318.

Answer 13

PERCENTILE.INC(B2:B76,0.60)=$74.40 billion. You must link directly to values in order to obtain the correct answer.

Answer 14

From the Insert menu, select Scatter, then select Scatter With Only Markers. The Input Y Range is C1:C101 and the Input X Range is B1:B101. You must check the Labels in first row box to ensure that the scatter plot’s axes are appropriately labeled.

Answer 15

CORREL(B2:B101,C2:C101)=-0.32. The correlation coefficient between the acceptance rate at the top 100 U.S. MBA programs and the percent of students that are employed upon graduation is approximately -0.32. You must link directly to values in order to obtain the correct answer.

Answer 16

On average, as the acceptance rate increases, the percent of students employed upon graduation increases. A positive correlation coefficient would indicate that, on average, as acceptance rate increases, the percent of students employed upon graduation increases. On average, as the acceptance rate decreases, the percent of students employed upon graduation decreases. A positive correlation coefficient would indicate that, on average, as acceptance rate decreases, the percent of students employed upon graduation decreases. **On average, as the acceptance rate decreases, the percent of students employed upon graduation increases.** **-0.32 is negative which indicates that, on average, as acceptance rate decreases, the percent of students employed upon graduation increases.** On average, as the acceptance rate increases, the percent of student employed upon graduation remains the same. A correlation coefficient of zero would indicate no relationship.

Answer 17

The distribution of the data is symmetric When the distribution of data is symmetric, the mean and median are equal. The distribution of the data is skewed to the left When the distribution of data is skewed to the left, the mean is most likely less than the median. The extreme values in the left tail pull the mean towards them. **The distribution of the data is skewed to the right** **When the distribution of data is skewed to the right, the mean is most likely greater than the median. The extreme values in the right tail pull the mean towards them.** The distribution of the data is bimodal When the distribution of data is bimodal, the mean may be less than, equal to, or greater than the median.

Answer 18

Cross-sectional

Answer 19

Cross-sectional

Answer 20

cross-sectional

Answer 21

Quality of life is a hidden variable because it cannot be measured directly but must be inferred from measurable variables such as wealth, success, and environment. A hidden variable is one that is correlated with each of two variables that are not fundamentally related to each other. In this example, we are not looking at a correlation between two variables, but rather trying to determine a single variable, quality of life. A recent study showed a correlation between a country’s chocolate consumption and the number of Nobel prizes won by its scientists. The hidden variable is a strong university system that fosters talented researchers. A hidden variable is one that is correlated with each of two variables that are not fundamentally related to each other. Although a strong university system is probably correlated with the number of Nobel prizes, it is probably not related to the amount of chocolate consumed, and so does not function as a hidden variable between prizes and chocolate. The correlation between smoking and lung cancer was a hidden variable for a long time because the cigarette lobby paid to keep the relationship hidden. A hidden variable is one that is correlated with each of two variables that are not fundamentally related to each other; it is not one that is being hidden due to political pressures. **There is a correlation between the number of firefighters who show up at a fire and how much damage the fire causes. The hidden variable is the size of the fire.** **A hidden variable is one that is correlated with each of two variables that are not fundamentally related to each other. In this case, the size of the fire leads to a call for more firefighters, and the size of the fire also generally leads to more damage. The number of firefighters does not lead to a greater amount of fire damage.**

Module 1 - Describing and Summarizing Data Flashcards

(49 cards)