Data Analytics Flashcards
11Business intelligence (BI) has all of the following characteristics except
A. Focusing on strategic objectives.
B. Giving immediate information about an organization’s critical success factors.
C. Displaying information in graphical format.
D. Providing advice and answers to top management from a knowledge-based system
Answer (D) is correct.
BI serves the needs of top management for managerial control and strategic planning. BI focuses on
strategic (long-range) objectives and gives immediate information about a firm’s critical success
factors. BI is not a program for providing top management with advice and answers from a knowledgebased
(expert) system.
12Cook Co.’s total costs of operating five sales offices last year were $500,000, of which $70,000
represented fixed costs. Cook has determined that total costs are significantly influenced by the number of sales
offices operated. Last year’s costs and number of sales offices can be used as the bases for predicting annual costs.
What would be the budgeted cost for the coming year if Cook were to operate seven sales offices?
A. $700,000
B. $672,000
C. $602,000
D. $586,000
Answer (B) is correct.
Using the formula y = a + bx, y is the total budgeted cost, a is the fixed costs, b is the variable cost per
unit, and x is the number of budgeted sales offices. The fixed costs are $70,000, the variable cost per
unit is $86,000 [($500,000 – $70,000) ÷ 5], and the number of budgeted sales offices is 7. Thus, the
budgeted cost for the coming year assuming seven sales offices is $672,000 [$70,000 + (7 × $86,000)].
14A regression equation
A. Estimates the dependent variable(s).
B. Encompasses factors outside the relevant range.
C. Is based on objective and constraint functions.
D. Estimates the independent variable.
Answer (A) is correct.
Regression analysis is used to find an equation for the linear relationship among variables. The
behavior of the dependent variable is explained in terms of one or more independent variables.
Regression analysis is often used to estimate a dependent variable (such as cost) given a known
independent variable (such as production).
15Mat Co. estimated its materials handling costs at two activity levels as follows:
Kilos Handled Cost
80,000 $160,000
60,000 132,000
What is Mat’s estimated cost for handling 75,000 kilos?
A. $150,000
B. $153,000
C. $157,500
D. $165,000
Answer (B) is correct.
The high-low method estimates variable cost by dividing the difference in costs incurred at the highest
and lowest observed levels of activity by the difference in activity. Once the variable cost is found, the
fixed portion is determinable. Hence, unit variable handling cost is $1.40 [($160,000 – $132,000) ÷
(80,000 kilos – 60,000 kilos)], the fixed cost is $48,000 [$132,000 – (60,000 kilos × $1.40)], and the
cost of handling 75,000 kilos is $153,000 [$48,000 + (75,000 kilos × $1.40)].
16Multiple regression differs from simple regression in that it
A. Provides an estimated constant term.
B. Has more dependent variables.
C. Allows the computation of the coefficient of determination.
D. Has more independent variables.
Answer (D) is correct.
Improved accuracy of forecasts may often be achieved by regressing the dependent variable on more
than one independent variable. The usual multiple regression equation is linear and is in the following
form when y is the dependent variable; a is the y-axis intercept; x1, x2, etc., are the independent
variables; b1, b2, etc., are the coefficients of the independent variables; and e is the error term:
y = a + b1x1 + b2x2 + … + e
20For cost estimation, simple regression differs from multiple regression in that simple regression
uses only
A. One dependent variable, while multiple regression uses all available data to estimate the cost function.
B. Dependent variables, while multiple regression can use both dependent and independent variables.
C. One independent variable, while multiple regression uses more than one independent variable.
D. One dependent variable, while multiple regression uses more than one dependent variable.
Answer (C) is correct.
Simple regression uses the algebraic formula for a straight line, y = a + bx, where x is the independent
variable. Multiple regression is used when there is more than one independent variable. Multiple
regression allows a firm to identify many factors (independent variables) and to weight each one
according to its influences on the overall outcome (y = a + b1x1 + b2x2 + b3x3 + etc.).
25Which of the following best describes unstructured data?
A. Data with a high level of organization.
B. Data systematically stored with markers to enforce hierarchies of records and fields within the data.
C. Information that is not organized in a pre-defined manner (e.g., text-heavy facts, dates, numbers, and
D. Conforms with the organization of data models associated with relational databases
Answer (C) is correct.
Unstructured data refers to information that is not organized in a pre-defined manner (e.g., text-heavy
facts, dates, numbers, and images).
26Each of the following represents a characteristic of big data except
A. Size.
B. Mixture.
C. Speed.
D. Uniformity.
Answer (D) is correct.
Big data is often characterized by the “4 Vs” - volume, variety, velocity, and veracity. Thus,
uniformity is not a characteristic of big data.
27Which of the following are key technologies of big data?
I. In-memory analytics
II. Data mining
III. Text mining
A. I only.
B. II only.
C. I and III only.
D. I, II, and III.
Answer (D) is correct.
Key technologies of big data include data mining, text mining, data management, in-memory analytics,
predictive analytics, and Hadoop.
28Which of the following is a correct statement regarding Hadoop?
A. It is open source software framework that stores large amounts of data and runs applications on clusters
of commodity hardware.
B. It analyzes data from system memory instead of hard drives.
C. It is a technology that uses data, statistical algorithms, and machine-learning techniques to identify the
likelihood of future outcomes based on historical data.
D. It analyzes text data from the web, comment fields, books, and other text-based sources through the use
of machine learning or natural language processing technology
Answer (A) is correct.
Hadoop is an open source software framework that stores large amounts of data and runs applications
on clusters of commodity hardware.
29Which of the following is a correct statement regarding in-memory analytics?
A. It is an open source software framework that stores large amounts of data and runs applications on
clusters of commodity hardware.
B. It analyzes data from system memory instead of hard drives.
C. It is a technology that uses data, statistical algorithms, and machine-learning techniques to identify the
likelihood of future outcomes based on historical data.
D. It examines large amounts of data to discover patterns in the data.
Answer (B) is correct.
In-memory analytics analyzes data from system memory instead of hard drives.
30Which of the following is a correct statement regarding volume-based value?
A. The faster businesses can inject data into their data and analytics platform, the more time they will have
to ask the right questions and seek answers.
B. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer
relationship management objectives.
C. The more data businesses have on the customers, both recent and historical, the greater the insights.
D. In the digital era, capability to acquire and analyze varied data is extremely valuable
Answer (C) is correct.
The more data businesses have on the customers, both recent and historical, the greater the insights is a
correct statement regarding volume-based value.
31All of the following are correct statements regarding velocity-based value except
A. The faster businesses can inject data into their data and analytics platform, the more time they will have
to ask the right questions and seek answers.
B. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer
relationship management objectives.
C. The computing power required to quickly process huge volumes and varieties of data can overwhelm a
single server or multiple servers. Organizations must apply adequate computer power to big data tasks to
achieve the desired velocity.
D. The more data businesses have on the customers, both recent and historical, the greater the insights.
Answer (D) is correct.
The more data businesses have on the customers, both recent and historical, the greater the insights is a
correct statement regarding volume-based value.
32An automobile parts manufacturer has received complaints from customers about declining quality.
After a quick review, management realizes the problem has no single source. To perform a thorough process of
problem identification, the most appropriate tool is a(n)
A. Fishbone diagram.
B. Histogram.
C. Pareto diagram.
D. Statistical control charts.
Answer (A) is correct.
A fishbone diagram (also called a cause-and-effect diagram or an Ishikawa diagram) is used in total
quality management for process improvement. It is useful in studying causation (why the actual and
desired situations differ). This format organizes the analysis of causation and helps to identify possible
interactions among causes.
33The new purchasing director is analyzing purchase orders for the organization. Which of the
following analyses would best be displayed on a histogram?
A. In the past year the organization placed 10,000 purchase orders. Organize the number of orders placed
with each supplier, sorted in descending order.
B. The average turnaround time from issuing a purchase order to receiving the merchandise is 7 days.
Review the last 2,000 purchase orders, and using 10 days as the upper control limit and 4 days as the
lower control limit, graph the turnaround time for each order.
C. The organization purchased US $27 million worth of inventory in the past year. Distribute by value, using
US $500 increments, the quantity of purchase orders that fall within each range.
D. Identify and organize the reasons the average turnaround time for purchase orders falls outside the control
parameters of 4-10 days.
Answer (C) is correct.
The histogram displays a continuous frequency distribution of the independent variable in the form of
a bar graph. The y axis is the quantity of purchase orders and the x axis is the purchase order amount.
The histogram would best display the quantity of purchase orders by dollar value
34The director of sales asks for a count of customers grouped in descending numerical rank by (1) the
number of orders they place during a single year and (2) the dollar amounts of the average order. The visual format
of these two pieces of information is most likely to be a
A. Fishbone diagram.
B. Cost of quality report.
C. Kaizen diagram.
D. Pareto diagram.
Answer (D) is correct.
A Pareto diagram displays the values of an independent variable such that managers can quickly
identify the areas most in need of attention. The variables involved must be quantifiable.
36Statistical quality control often involves the use of control charts whose basic purpose is to
A. Determine when accounting control procedures are not working.
B. Control labor costs in production operations.
C. Detect performance trends away from normal operations.
D. Monitor internal control applications of information technology.
Answer (C) is correct.
Statistical control charts are graphic aids for monitoring the status of any process subject to acceptable
and unacceptable variations during repeated operations. The chart consists of three horizontal lines
plotted on a horizontal time scale. The vertical scale represents the appropriate quantitative measure.
The center line represents the average range or overall mean for the process being controlled. The other
two lines are the upper control limit and the lower control limit. The processes are measured
periodically, and the values are plotted on the chart. If the value falls within the control limits, no
action is taken. If the value falls outside the limits, the process is considered out of control, and an
investigation is made for possible corrective action. Another advantage of the chart is that it makes
trends visible.
37A manufacturer mass produces nuts and bolts on its assembly line. The line supervisors sample
every nth unit for conformance with specifications. Once a nonconforming part is detected, the machinery is shut
down and adjusted. The most appropriate tool for this process is a
A. Fishbone diagram.
B. Cost of quality report.
C. Regression analysis.
D. Statistical quality control chart.
Answer (D) is correct.
Statistical quality control is a method of determining whether the shipment or production run of units
lies within acceptable limits. It is also used to determine whether production processes are out of
control. Statistical control charts are graphic aids for monitoring the status of any process subject to
acceptable or unacceptable variations during repeated operations
38A chief executive officer (CEO) believes that a major competitor may be planning a new campaign.
The CEO sends a questionnaire to key personnel asking for original thinking concerning what the new campaign
may be. The CEO selects the best possibilities and then sends another questionnaire asking for the most likely
option. The process employed by the CEO is called the
A. Least squares technique.
B. Delphi technique.
C. Simulation technique.
D. Simple regression.
Answer (B) is correct.
The Delphi approach solicits opinions from experts, summarizes the opinions, and feeds the summaries
back to the experts (without revealing participants to each other).
40A cereal producer often receives complaints that boxes are underweight. To gather data on the
extent of this problem, the producer should develop which of the following?
A. Fishbone diagram.
B. Statistical control chart.
C. Cost-of-quality analysis.
D. Pareto chart.
Answer (B) is correct.
A statistical control chart is used to monitor the extent of variances between actual and expected
results. The chart has three horizontal lines. The midline is the target value, and the other lines are the
upper and lower control limits. Results are plotted on the chart. Accordingly, the cereal producer
should develop a control chart to gather data on the extent of the variances between the actual and
expected (midline) weight(s) of the cereal boxes.
41Which of the following can be discovered using a data-mining process?
A. Data structure.
B. Previously unknown information.
C. Artificial intelligence.
D. Standard query reporting.
Answer (B) is correct.
Data mining examines large amounts of data to discover patterns in the data (i.e., unexpected
relationships among data). A classic example of the use of data mining is the discovery by convenience
stores that diapers and beer often appear on the same sales transaction in the late evening. Thus,
previously unknown information can be discovered using a data-mining process
42Which of the following forecasting methods relies mostly on judgment?
A. Time series models.
B. Sensitivity analysis.
C. Delphi
D. Regression.
Answer (C) is correct.
The Delphi approach solicits opinions from experts, summarizes the opinions, and feeds the summaries back to the experts (without revealing participants to each other). Hence, this method relies mostly on
expert judgments.
43Which of the following is a critical success factor in data mining a large data store?
A. Pattern recognition.
B. Effective search engines.
C. Image processing systems.
D. Accurate universal resource locator (URL).
Answer (A) is correct.
Data mining allows a user to discover hidden relationships, such as associations, sequences of events,
classifications (descriptions of the groups to which the item belongs), or clusters (new groupings
previously not known). Typical applications of data mining are identification of potential customers
and purchasing power.
44Which of the following best describes a characteristic of big data?
A. Collected data often provides straightforward answers to users.
B. Data collected are free from useless information or incorrect variables.
C. Big data is in a visual context, such as a graph or chart, rather than a text format.
D. Data of untapped markets is often not collected.
Answer (D) is correct.
One limitation of big data is that user-level data results are incomplete. Generally, the data available to
an organization are restricted to data of persons who have had some contact with the organization (e.g.,
visited the organization’s website or called the organization). The data are only representative of the
target market; thus, untapped markets could potentially exist, the data of which are not being captured.
45A company uses big data analytics in marketing. Which of the following is a limitation of using big
A. The company can use big data to predict customer behaviors.
B. Data results cannot be visualized to identify and forecast customer trends.
C. Big data cannot explain why customers behave in certain ways.
D. Data collected only represent untapped customers but not tapped customers.
Answer (C) is correct.
One limitation of big data is that determining why the analysis results are what they are is difficult.
While big data analysis can show that there is a certain pattern in monthly sales, it fails to show what
causes the pattern. Further and more complicated analyses are needed, the results of which tend to be
more difficult for non-technical people to understand.
52What-if analysis is also known as
A. Goal seeking analysis.
B. Trend analysis.
C. Regression analysis.
D. Horizontal analysis.
Answer (A) is correct.
What-if analysis is a process of determining the effects on outcomes in a model through changes in
scenarios. What-if analysis is also known as goal seeking analysis. Goal seeking occurs when the
decision maker has a specific outcome in mind and needs to determine how it can be achieved.
57An analyst is preparing a time series analysis for the sales of swimwear. He notices that, for the last
3 years, the swimwear sales rise during May and August and fall during November and February. Which of the
following patterns best describes this scenario?
A. Irregular pattern.
B. Cyclical pattern.
C. Chronological pattern.
D. Seasonal pattern.
Answer (D) is correct.
A seasonal pattern often exists when a time series is influenced by seasonal factors (e.g., the quarter of
the year, the month, or the day of the week). The swimwear sales rise during May and August (summer
months) and fall during November and February (winter months). Thus, this scenario is best described
as a seasonal pattern
58Which of the following most likely shows a cyclical pattern?
A. Traffic peaks in the morning and evening hours.
B. An increase in sales of gloves in the winter.
C. Economic data affected by recession.
D. An increase in oil prices due to an oil workers’ strike.
Answer (C) is correct.
A cyclical pattern exists when data points show rises and falls that are not of a fixed or seasonal
pattern. The duration of these cyclical fluctuations is usually at least a couple of years. An example is
business cycles (i.e., expansion, peak, recession, depression), which typically last several years, but the
length of the current cycle is never known in advance.
59Which of the following most likely shows an irregular pattern?
A. Increase in GDP during expansion.
B. Decrease in the number of tourists due to an earthquake.
C. Decrease in the sales of ice cream during winter.
D. Increase in the number of tourists during weekends and holidays.
Answer (B) is correct.
An irregular pattern exists when random or unplanned factors occur. An earthquake is an unplanned
factor. Thus, a decrease in the number of tourists due to an earthquake is an example of an irregular
60Time series analysis
A. Uses trial and error to determine the effects of changes in assumptions on outcome.
B. Analyzes current results against past activity to determine shifts in trends.
C. Solicits opinions from experts, summarizes the opinions, and feeds the summaries back to the experts
without revealing participants to each other.
D. Assists with determining whether a sample is representative of the population
Answer (B) is correct.
Time series analysis is the process of projecting future trends based on past experience. Benefits of
time series analysis include analyzing current results against activity in the past to determine trend
61Which is the correct order of the steps in the data mining process?
I. Perform regression analysis to generate an equation that models the data.
II. Identify anomalies and unusual data records.
III. Prepare visual presentations and reports.
IV. Generalize the relationships among data.
V. Find relationships between variables and group the relationships.
A. I, V, IV, II, III.
B. II, V, IV, I, III.
C. II, I, IV, V, III.
D. II, IV, V, III, I.
Answer (B) is correct.
The steps for the data mining process are
1. Identify anomalies and unusual data records.
2. Find relationships between variables and group the relationships.
3. Generalize the relationships among data.
4. Perform regression analysis to generate an equation that models the data.
5. Prepare visual presentations and reports.
62The goal of structured query language (SQL) is to
A. Evaluate and document the structure of the database.
B. Clean up incorrect, incomplete, or duplicated data before uploading it into the database.
C. Create, update, retrieve, and manage data within a database.
D. Provide detailed information about the size, format, usage, meaning, and ownership of every data
Answer (C) is correct.
SQL is a language used to access and manipulate data within relational database management systems.
Queries are constructed and executed using a set of commands to create, update, and retrieve