Data Analytics Flashcards
11Business intelligence (BI) has all of the following characteristics except
A. Focusing on strategic objectives.
B. Giving immediate information about an organization’s critical success factors.
C. Displaying information in graphical format.
D. Providing advice and answers to top management from a knowledge-based system
Answer (D) is correct.
BI serves the needs of top management for managerial control and strategic planning. BI focuses on
strategic (long-range) objectives and gives immediate information about a firm’s critical success
factors. BI is not a program for providing top management with advice and answers from a knowledgebased
(expert) system.
12Cook Co.’s total costs of operating five sales offices last year were $500,000, of which $70,000
represented fixed costs. Cook has determined that total costs are significantly influenced by the number of sales
offices operated. Last year’s costs and number of sales offices can be used as the bases for predicting annual costs.
What would be the budgeted cost for the coming year if Cook were to operate seven sales offices?
A. $700,000
B. $672,000
C. $602,000
D. $586,000
Answer (B) is correct.
Using the formula y = a + bx, y is the total budgeted cost, a is the fixed costs, b is the variable cost per
unit, and x is the number of budgeted sales offices. The fixed costs are $70,000, the variable cost per
unit is $86,000 [($500,000 – $70,000) ÷ 5], and the number of budgeted sales offices is 7. Thus, the
budgeted cost for the coming year assuming seven sales offices is $672,000 [$70,000 + (7 × $86,000)].
14A regression equation
A. Estimates the dependent variable(s).
B. Encompasses factors outside the relevant range.
C. Is based on objective and constraint functions.
D. Estimates the independent variable.
Answer (A) is correct.
Regression analysis is used to find an equation for the linear relationship among variables. The
behavior of the dependent variable is explained in terms of one or more independent variables.
Regression analysis is often used to estimate a dependent variable (such as cost) given a known
independent variable (such as production).
15Mat Co. estimated its materials handling costs at two activity levels as follows:
Kilos Handled Cost
80,000 $160,000
60,000 132,000
What is Mat’s estimated cost for handling 75,000 kilos?
A. $150,000
B. $153,000
C. $157,500
D. $165,000
Answer (B) is correct.
The high-low method estimates variable cost by dividing the difference in costs incurred at the highest
and lowest observed levels of activity by the difference in activity. Once the variable cost is found, the
fixed portion is determinable. Hence, unit variable handling cost is $1.40 [($160,000 – $132,000) ÷
(80,000 kilos – 60,000 kilos)], the fixed cost is $48,000 [$132,000 – (60,000 kilos × $1.40)], and the
cost of handling 75,000 kilos is $153,000 [$48,000 + (75,000 kilos × $1.40)].
16Multiple regression differs from simple regression in that it
A. Provides an estimated constant term.
B. Has more dependent variables.
C. Allows the computation of the coefficient of determination.
D. Has more independent variables.
Answer (D) is correct.
Improved accuracy of forecasts may often be achieved by regressing the dependent variable on more
than one independent variable. The usual multiple regression equation is linear and is in the following
form when y is the dependent variable; a is the y-axis intercept; x1, x2, etc., are the independent
variables; b1, b2, etc., are the coefficients of the independent variables; and e is the error term:
y = a + b1x1 + b2x2 + … + e
20For cost estimation, simple regression differs from multiple regression in that simple regression
uses only
A. One dependent variable, while multiple regression uses all available data to estimate the cost function.
B. Dependent variables, while multiple regression can use both dependent and independent variables.
C. One independent variable, while multiple regression uses more than one independent variable.
D. One dependent variable, while multiple regression uses more than one dependent variable.
Answer (C) is correct.
Simple regression uses the algebraic formula for a straight line, y = a + bx, where x is the independent
variable. Multiple regression is used when there is more than one independent variable. Multiple
regression allows a firm to identify many factors (independent variables) and to weight each one
according to its influences on the overall outcome (y = a + b1x1 + b2x2 + b3x3 + etc.).
25Which of the following best describes unstructured data?
A. Data with a high level of organization.
B. Data systematically stored with markers to enforce hierarchies of records and fields within the data.
C. Information that is not organized in a pre-defined manner (e.g., text-heavy facts, dates, numbers, and
images).
D. Conforms with the organization of data models associated with relational databases
Answer (C) is correct.
Unstructured data refers to information that is not organized in a pre-defined manner (e.g., text-heavy
facts, dates, numbers, and images).
26Each of the following represents a characteristic of big data except
A. Size.
B. Mixture.
C. Speed.
D. Uniformity.
Answer (D) is correct.
Big data is often characterized by the “4 Vs” - volume, variety, velocity, and veracity. Thus,
uniformity is not a characteristic of big data.
27Which of the following are key technologies of big data?
I. In-memory analytics
II. Data mining
III. Text mining
A. I only.
B. II only.
C. I and III only.
D. I, II, and III.
Answer (D) is correct.
Key technologies of big data include data mining, text mining, data management, in-memory analytics,
predictive analytics, and Hadoop.
28Which of the following is a correct statement regarding Hadoop?
A. It is open source software framework that stores large amounts of data and runs applications on clusters
of commodity hardware.
B. It analyzes data from system memory instead of hard drives.
C. It is a technology that uses data, statistical algorithms, and machine-learning techniques to identify the
likelihood of future outcomes based on historical data.
D. It analyzes text data from the web, comment fields, books, and other text-based sources through the use
of machine learning or natural language processing technology
Answer (A) is correct.
Hadoop is an open source software framework that stores large amounts of data and runs applications
on clusters of commodity hardware.
29Which of the following is a correct statement regarding in-memory analytics?
A. It is an open source software framework that stores large amounts of data and runs applications on
clusters of commodity hardware.
B. It analyzes data from system memory instead of hard drives.
C. It is a technology that uses data, statistical algorithms, and machine-learning techniques to identify the
likelihood of future outcomes based on historical data.
D. It examines large amounts of data to discover patterns in the data.
Answer (B) is correct.
In-memory analytics analyzes data from system memory instead of hard drives.
30Which of the following is a correct statement regarding volume-based value?
A. The faster businesses can inject data into their data and analytics platform, the more time they will have
to ask the right questions and seek answers.
B. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer
relationship management objectives.
C. The more data businesses have on the customers, both recent and historical, the greater the insights.
D. In the digital era, capability to acquire and analyze varied data is extremely valuable
Answer (C) is correct.
The more data businesses have on the customers, both recent and historical, the greater the insights is a
correct statement regarding volume-based value.
31All of the following are correct statements regarding velocity-based value except
A. The faster businesses can inject data into their data and analytics platform, the more time they will have
to ask the right questions and seek answers.
B. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer
relationship management objectives.
C. The computing power required to quickly process huge volumes and varieties of data can overwhelm a
single server or multiple servers. Organizations must apply adequate computer power to big data tasks to
achieve the desired velocity.
D. The more data businesses have on the customers, both recent and historical, the greater the insights.
Answer (D) is correct.
The more data businesses have on the customers, both recent and historical, the greater the insights is a
correct statement regarding volume-based value.
32An automobile parts manufacturer has received complaints from customers about declining quality.
After a quick review, management realizes the problem has no single source. To perform a thorough process of
problem identification, the most appropriate tool is a(n)
A. Fishbone diagram.
B. Histogram.
C. Pareto diagram.
D. Statistical control charts.
Answer (A) is correct.
A fishbone diagram (also called a cause-and-effect diagram or an Ishikawa diagram) is used in total
quality management for process improvement. It is useful in studying causation (why the actual and
desired situations differ). This format organizes the analysis of causation and helps to identify possible
interactions among causes.
33The new purchasing director is analyzing purchase orders for the organization. Which of the
following analyses would best be displayed on a histogram?
A. In the past year the organization placed 10,000 purchase orders. Organize the number of orders placed
with each supplier, sorted in descending order.
B. The average turnaround time from issuing a purchase order to receiving the merchandise is 7 days.
Review the last 2,000 purchase orders, and using 10 days as the upper control limit and 4 days as the
lower control limit, graph the turnaround time for each order.
C. The organization purchased US $27 million worth of inventory in the past year. Distribute by value, using
US $500 increments, the quantity of purchase orders that fall within each range.
D. Identify and organize the reasons the average turnaround time for purchase orders falls outside the control
parameters of 4-10 days.
Answer (C) is correct.
The histogram displays a continuous frequency distribution of the independent variable in the form of
a bar graph. The y axis is the quantity of purchase orders and the x axis is the purchase order amount.
The histogram would best display the quantity of purchase orders by dollar value