20.2 Analytics and Big Data Flashcards
All of the following are correct statements regarding velocity-based value except
A. The faster businesses can inject data into their data and analytics platform, the more time they will have to ask the right questions and seek answers.
B. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer relationship management objectives.
C. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or multiple servers. Organizations must apply adequate computer poser to big data tasks to achieve the desired velocity.
D. The more data businesses have on the customers, both recent and historical, the greater the insights.
D. The more data businesses have on the customers, both recent and historical, the greater the insights.
The more data businesses have on the customers, both recent and historical, the greater the insight is a correct statement regarding volume-based value.
Define and describe the 4 Vs of big data
- Volume: the amount of data
- Variety: data exist in a wide variety of file types
- Velocity: the speed at which big data are generated and must be analyzed
- Veracity: the trustworthiness of the data
All of the following are correct statements regarding big data except
A. Big data is an evolving term that describes any voluminous amount of structured, semi-structured, and unstructured data that has the potential to be mined for information.
B. Big data includes information collected from social media, data from Internet-enabled devices, machine data, video, and voice recordings. The information collected is converted from high-density data into low-density data.
C. Big data is often characterized by the “4 Vs”: volume, variety, velocity, and veracity.
D. Big data involves processing data with analytic and algorithmic tools to reveal meaningful information.
B. Big data includes information collected from social media, data from Internet-enabled devices, machine data, video, and voice recordings. The information collected is converted from high-density data into low-density data.
The information collected is converted from low-density data to high-density data, not from high-density data to low-density data.
A hospital has observed an increase in the number of cases of a disease and has asked an analyst to collect data on the cases over the last 3 years. The analyst noted that the disease appeared 3 years ago during the second quarter of the year. Since then, the third and fourth quarters of each year showed significant spikes in the number of cases when compared to the first two quarters. What is the best way to present these findings?
A. Table, showing the number of cases in each month for the last 3 years.
B. Pie chart, showing the number of cases in each quarter for the last 3 years.
C. Scatter plot, showing the change in the number of cases for each quarter for the last 3 years.
D. Bar graph, showing the number of cases in each quarter for the last 3 years.
D. Bar graph, showing the number of cases in each quarter for the last 3 years.
A bar chart (also called bar graph) is the best way to present the findings because it shows the number of cases each quarter in comparison to other quarters.
Which one of the following statements defines data mining?
A. A process of using statistical techniques to extract and analyze data from large databases to discern patterns and trends.
B. A system used to develop a firm’s performance metrics.
C. A process of using algorithms that serve to facilitate efficient communication within a firm.
D. A system used to organize and interpret complex data to ensure the data has been accurately recorded in the database.
A. A process of using statistical techniques to extract and analyze data from large databases to discern patterns and trends.
Data mining examines large amounts of data to discover patterns using statistical models and techniques. The term data mining is somewhat misleading because its purpose is the discovery of patterns in large amounts of data, not the extraction of the data itself.
A simple regression equation has an r 2 of 0.85. This means that
A. 85% of the variation of the dependent variable is explained by the regression line.
B. 85% of the variation of the independent variable is explained by the regression line.
C. The dependent and independent variables have a correlation coefficient of 0.85.
D. The dependent variable does not have a strong correlation with the independent variable.
A. 85% of the variation of the dependent variable is explained by the regression line.
The coefficient of determination (r 2) is a measure of the fit between the independent and dependent variables. The coefficient of determination is the proportion of the total variation in the dependent variable that is accounted for by the independent variable. The value of r2 ranges from 0 to 1. The closer the value of r2 to 1, the more useful the independent variable (x) is for explaining or predicting the variation in the dependent variable (y). A coefficient of determination of 0.85 equals 85% of the variation of the dependent variable explained by the regression line.
The type of data analytics that is most likely to yield the most impact for an organization but is also the most complex is called
A. Diagnostic analysis.
B. Predictive analysis.
C. Descriptive analysis.
D. Prescriptive analysis.
D. Prescriptive analysis
Prescriptive analysis concentrates on what an organization needs to do in order for the predicted future results to actually occur. In other words, prescriptive analysis tells a company how to get to where it wants to go. This type of analysis provides the most benefit but requires more inputs.
An organization wants to utilize business intelligence (BI) to assist in the evaluation of key metrics. The IT manager has suggested incorporating a dashboard feature in its BI tool. Which one of the following is the main reason that management should implement the dashboard feature?
A. It allows management to have as many different charts as possible.
B. It shows patterns and trends in data across the organization.
C. It is designed to focus on metrics that have not been met.
D. It can automatically generate reorders of important materials for production.
B. It shows patterns and trends in data across the organization.
Business intelligence refers to software applications, tools, and practices that can be used to analyze and organize raw data. The objective is to speed up and improve managerial decision making and to uncover new business opportunities. Visualizing and analyzing business data can be easier with dashboard features. A standard dashboard for key analytical functions makes collaboration easier by enabling a large organization to share its analyses across the entire entity. Basically, a dashboard is defined as an interactive user interface that presents information in an easy-to-read and easy-to-understand manner.
A car insurance company is considering opening branches in a foreign country. The country’s population has been growing rapidly for the last 5 years. The company wants to know whether the number of car accidents are correlated with the recent population growth. Which of the following charts prepared by an analyst would be most helpful to the company?
A. A line chart showing each month on the x-axis and the number of car accidents on the y-axis.
B. A scatter plot showing the number of car accidents on the y-axis and the population size on the x-axis.
C. A table showing the number of car accidents in rows and the population size in columns.
D. A pie chart showing the number of car accidents in each city for the last 5 years.
B. A scatter plot showing the number of car accidents on the y-axis and the population size on the x-axis.
A scatter plot is the best way to present the relationship by illustrating the correlation between the number of car accidents (y variable) and the population size (x variable)
An analyst is preparing a time series analysis for the sales of swimwear. He notices that, for the last 3 years, the swimwear sales rise during May and August and fall during November and February. Which of the following patterns best describes this scenario?
A. Seasonal pattern
B. Irregular pattern
C. Cyclical pattern
D. Chronological pattern
A. Seasonal pattern
A seasonal pattern often exists when a time series is influenced by seasonal factors (e.g., the quarter of the year, the month, or the day of the week). The swimwear sales rise during May and August (summer months) and fall during November and February (winter months). Thus, this scenario is best described as a seasonal pattern.
An analyst is performing a sensitivity analysis on how changes in the inflation rate and the interest rate can affect bond prices. Which of the following most likely causes the sensitivity analysis to be inaccurate?
A. Inflation is expected to increase in the next year
B. Interest rates are affected by changes in the inflation rate
C. Bonds prices are extremely volatile to both changes in inflation and interest rates
D. The interest rate changes affect bond prices to a greater extent than the inflation rate
B. Interest rates are affected by changes in the inflation rate
Sensitivity analysis is limited due to the consideration of variables individually as opposed to all together. If both bond prices and interest rates are affected by changes in the inflation rates, the results of the sensitivity analysis may be distorted.
The new purchasing director is analyzing purchase orders for the organization. Which of the following analyses would best be displayed on a histogram?
A. The organization purchased US $27 million worth of inventory in the past year. Distribute by value, using US $500 increments, the quantity of purchase orders that fall within each range.
B. In the past year the organization placed 10,000 purchase orders. Organize the number of orders placed with each supplier, sorted in descending order.
C. The average turnaround time from issuing a purchase order to receiving the merchandise is 7 days. Review the last 2,000 purchase orders, and using 10 days as the upper control limit and 4 days as the lower control limit, graph the turnaround time for each order.
D. Identify and organize the reasons the average turnaround time for purchase orders falls outside the control parameters of 4-10 days.
A. The organization purchased US $27 million worth of inventory in the past year. Distribute by value, using US $500 increments, the quantity of purchase orders that fall within each range.
The histogram displays a continuous frequency distribution of the independent variable in the form of a bar graph. The y axis is the quantity of purchase orders and the x axis is the purchase order amount. The histogram would best display the quantity of purchase orders by dollar value.
The director of sales asks for a count of customers grouped in descending numerical rank by (1) the number of orders they place during a single year and (2) the dollar amounts of the average order. The visual format of these two pieces of information is most likely to be a
A. Cost of quality report.
B. Fishbone diagram.
C. Pareto diagram.
D. Kaizen diagram
C. Pareto diagram
A Pareto diagram displays the values of an independent variable such that managers can quickly identify the areas most in need of attention. The variables involved must be quantifiable.
Which of the following is a correct statement regarding volume-based value?
A. The faster businesses can inject data into their data and analytics platform, the more time they will have to ask the right questions and seek answers.
B. In the digital era, capability to acquire and analyze varied data is extremely valuable.
C. The more data businesses have on the customers, both recent and historical, the greater the insights.
D. Rapid analysis capabilities provide businesses with the right decision in time to achieve their customer relationship management objectives.
C. The more data businesses have on the customers, both recent and historical, the greater the insights.
The more data businesses have on the customers, both recent and historical, the greater the insights is a correct statement regarding volume-based value.
The data analytics tool that is used to explore data sets and group them into predefined categories is called
A. Clustering
B. Logistic regression
C. Linear regression
D. Classification
D. Classification
Classification is a statistical technique used to explore data and group the data into predefined categories.