Final Flashcards

1
Q

There are basic chart types and specialized chart types. A Gantt chart is a specialized chart type.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set.

A

arithmetic mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A(n) ________ architecture is used to build a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts.

A

Hub-and-spoke

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Converting continuous valued numerical variables to rangers and categories is referred to as discretization

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data source reliability means that data are correct and are a good match the analytics problem

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The competitive imperatives for BI include all of the following except

A

Right user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In the 2000s, the DW-driven DSSs began to be called BI systems.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are enterprise resources planning (ERP) systems related to supply chain management (SCM) systems?

A

Complementary systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Information dashboards enable ________ operations that allow the users to view underlying data sources and obtain more detail.

A

drill-down/drill-through

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the opening case, police detectives used data mining to identify possible new areas of inquiry.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Clustering partitions a collection of things into segments whose members share

A

Similar Characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ratio data is a type of categorical data.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following is a data mining myth?

A

Data mining requires a separate, dedicated database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If using a mining analogy, “knowledge mining” would be a more appropriate term than “data mining.”

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The cost of data storage has plummeted recently, making data mining feasible for more firms.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

All of the following statements about data mining are true EXCEPT

A

the process aspect means that data mining should be a one-step process to results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

K-fold cross-validation is also called sliding estimation.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is

A

the processing power needed for the centralized model would overload a single computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In the Opening Vignette on Sports Analytics, what type of modeling was used to predict offensive tactics?

A

Heat Maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?

A

Prescriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Demands for instant, on-demand access to dispersed information decrease as firms successfully integrate BI into their operations.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The use of dashboards and data visualizations is seldom effective in identifying issues in organizations, as demonstrated by the Silvaris Corporation Case Study.

A

false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Today, many vendors offer diversified tools, some of which are completely preprogrammed (called shells). How are these shells utilized?

A

All a user needs to do is insert the numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

The growth in hardware, software, and network capacities has had little impact on modern BI innovations.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Information systems that support such transactions as ATM withdrawals, bank deposits, and cash register scans at the grocery store represent transaction processing, a critical branch of BI.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If using a mining analogy, “knowledge mining” would be a more appropriate term than “data mining.”

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Ratio data is a type of categorical data

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

The use of dashboards and data visualizations is seldom effective in identifying issues in organizations as demonstrated by the Silvarts corporation Case study

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Markey basket

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network

A

Cohesion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What has caused the growth of the demand for instant, on-demand access to dispersed information?

A

the more pressing need to close the gap between the operational data and strategic objectives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

The need for more versatile reporting than what was available in 1980s era ERP systems led to the development of what type of system?

A

executive information systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What storage system and processing algorithm were developed by Google for Big Data?

A

*
Google developed and released as an Apache project the Hadoop Distributed File System
(HDFS) for storing large amounts of data in a distributed way.
*
Google developed and released as an Apache project the MapReduce algorithm for pushing
computation to the data, instead of pushing data to a computing node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Describe the role of the simple split in estimating the accuracy of classification models.

A

The simple split (or holdout or test sample estimation) partitions the data into two mutually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-third as the test set. The training set is used by the inducer (model builder), and the built classifier is then tested on the test set. An exception to this rule occurs when the classifier is an artificial neural network. In this case, the data is partitioned into three mutually exclusive subsets: training, validation, and testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Data is the contextualization of information, that is, information set in context

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

This measure of dispersion is calculated by simply taking the square root of the variations.

A

standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Nominal data represent the labels of multiple classes used to divide a variable into specific groups.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

This plot is a graphical illustration of several descriptive statistics about a given data set

A

Box and whisker plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration?

A

Pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Which type of visualization tool can be very helpful when a data set contains location data?

A

Geographic map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

The data storage component of a business reporting system builds the various reports and hosts them for, or disseminates them to users. It also provides notification, annotation, collaboration, and other services.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

One way an operational data store differs from a data warehouse is the recency of their data.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Properly integrating data from various databases and other disparate sources is a trivial process.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is Six Sigma?

A

a methodology aimed at reducing the number of defects in a business process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a

A

three-tier architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure?

A

Star schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

User-initiated navigation of data through disaggregation is referred to as “drill up.”

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Data warehouses are subsets of data marts.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

The BPM development cycle is essentially a one-shot process where the requirement is to get it right the first time.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are

A

Subject-oriented and nonvolatile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What type of analytics seeks to determine what is likely to happen in the future?

A

Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Online transaction processing (OLTP) systems handle a company’s routine ongoing business. In contrast, a data warehouse is typically

A

a distinct system that provides storage for data that will be made use of in analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket sales?

A

Ticket prices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Successful BI is a tool for the information systems department, but is not exposed to the larger organization.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Business intelligence (BI) is a specific term that describes architectures and tools only.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Managing information on operations, customers, internal procedures and employee interactions is the domain of cognitive science.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

The user interface of a BI system is often referred to as a(n) ________.

A

Dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

As the number of potential BI applications increases, the need to justify and prioritize them arises. This is not an easy task due to the large number of ________ benefits.

A

Intangible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

________ series forecasting is the use of mathematical modeling to predict future values of the variable of interest based on previously observed values.

A

Time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Dashboards present visual displays of important information that are consolidated and arranged on a single ________.

A

Screen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Descriptive statistics is all about describing the sample data on hand.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data?

A

data granularity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Which characteristic of data means that all the required data elements are included in the data set?

A

Data richness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Data source reliability means that data are correct and are a good match for the analytics problem.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization’s strategy.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Moving the data into a data warehouse is usually the easiest part of its creation.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

A

Data stores

82
Q

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

83
Q

In the Influence Health case study, what was the goal of the system?

A

increasing service use

84
Q

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from

A

analyzing the vast data amounts routinely collected.

85
Q

Statistics and data mining both look for data sets that are as large as possible.

86
Q

The data field “ethnic group” can be best described as

A

nominal data.

87
Q

In the Target case study, why did Target send a teen maternity ads?

A

Target’s analytic model suggested she was pregnant based on her buying habits.

88
Q

One way to accomplish privacy and protection of individuals’ rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

A

de-identification

89
Q

Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.

90
Q

In the Influence Health case, the company was able to evaluate over ________ million records in only two days.

91
Q

What are the most important assumptions in linear regression?

A
  1. Linearity. This assumption states that the relationship between the response variable and the explanatory variables is linear. That is, the expected value of the response variable is a straight-line function of each explanatory variable, while holding all other explanatory variables fixed. Also, the slope of the line does not depend on the values of the other variables. It also implies that the effects of different explanatory variables on the expected value of the response variable are additive in nature. 2. Independence (of errors). This assumption states that the errors of the response variable are uncorrelated with each other. This independence of the errors is weaker than actual statistical independence, which is a stronger condition and is often not needed for linear regression analysis. 3. Normality (of errors). This assumption states that the errors of the response variable are normally distributed. That is, they are supposed to be totally random and should not represent any nonrandom patterns. 4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the response variables have the same variance in their error, regardless of the values of the explanatory variables. In practice this assumption is invalid if the response variable varies over a wide enough range/scale. 5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e., do not replicate the same but provide a different perspective of the information needed for the model). Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly included in the model twice, one with a slight transformation of the same variable). A correlation-based data assessment usually catches this error.
92
Q

With ________, all the data from every corner of the enterprise is collected and integrated into a consistent schema so that every part of the organization has access to the single version of the truth when and where needed.

A

Enterprise Resource Planning (ERP)

93
Q

Briefly describe five techniques (or algorithms) that are used for classification modeling.

A

*
Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the
most popular classification technique in the data mining arena.
*
Statistical analysis. Statistical techniques were the primary classification algorithm for many
years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis.
*
Neural networks. These are among the most popular machine-learning techniques that can be
used for classification-type problems.
*
Case-based reasoning. This approach uses historical cases to recognize commonalities in order
to assign a new case into the most probable category.
*
Bayesian classifiers. This approach uses probability theory to build classification models based
on the past occurrences that are capable of placing a new instance into a most probable class (or category).
*
Genetic algorithms. This approach uses the analogy of natural evolution to build
directed-search-based mechanisms to classify data samples.
*
Rough sets. This method takes into account the partial membership of class labels to predefined
categories in building models (collection of rules) for classification problems.

94
Q

Six Sigma rests on a simple performance improvement model known as DMAIC. What are the steps involved?

A

Define. Define the goals, objectives, and boundaries of the improvement activity. At the top level, the goals are the strategic objectives of the company. At lower levels—department or project levels—the goals are focused on specific operational processes. 2. Measure. Measure the existing system. Establish quantitative measures that will yield statistically valid data. The data can be used to monitor progress toward the goals defined in the previous step. 3. Analyze. Analyze the system to identify ways to eliminate the gap between the current performance of the system or process and the desired goal. 4. Improve. Initiate actions to eliminate the gap by finding ways to do things better, cheaper, or faster. Use project management and other planning tools to implement the new approach. 5. Control. Institutionalize the improved system by modifying compensation and incentive systems, policies, procedures, manufacturing resource planning, budgets, operation instructions, or other management systems.

95
Q

Many business users in the 1980s referred to their mainframes as “the black hole,” because all the information went into it, but little ever came back and ad hoc real-time querying was virtually impossible.

96
Q

Computerized support is only used for organizational decisions that are responses to external pressures, not for taking advantage of opportunities.

97
Q

Data generation is a precursor, and is not included in the analytics ecosystem.

98
Q

In what decade did disjointed information systems begin to be integrated?

99
Q

Major commercial business intelligence (BI) products and services were well established in the early 1970s.

100
Q

BI represents a bold new paradigm in which the company’s business strategy must be aligned to its business intelligence analysis initiatives.

101
Q

Kaplan and Norton developed a report that presents an integrated view of success in the organization called

A

balanced scorecard-type reports.

102
Q

Interval data are variables that can be measured on interval scales.

103
Q

Predictive algorithms generally require a flat file with a target variable, so making data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms.

104
Q

Data accessibility means that the data are easily and readily obtainable

105
Q

This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set

A

arithmetic mean

106
Q

Structured data is what data mining algorithms use and can be classified as categorical or numeric.

107
Q

Key performance indicators (KPIs) are metrics typically used to measure

A

Internal results

108
Q

Visual analytics is aimed at answering, “What is it happening?” and is usually associated with business analytics.

109
Q

Oper marts are created when operational data needs to be analyzed

A

multidimensionally.

110
Q

With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization’s strategy.

111
Q

Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests?

A

parallel processing

112
Q

When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is

A

Drill down

113
Q

_______ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

A

Enterprise information integration (EII)

114
Q

Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts?

A

hub-and-spoke data warehouse architecture

115
Q

All of the following are benefits of hosted data warehouses EXCEPT

A

greater control of data.

116
Q

Why is a performance management system superior to a performance measurement system?

A

because measurement alone has little use without action

117
Q

In the Influence Health case study, what was the goal of the system?

A

increasing service use

118
Q

Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?

119
Q

In estimating the accuracy of data mining (or other) classification models, the true positive rate is

A

the ratio of correctly classified positives divided by the total positive count.

120
Q

What is the main reason parallel processing is sometimes used for data mining?

A

because of the massive data amounts and search efforts involved

121
Q

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?

122
Q

is an evolving tool space that promises real-time integration from a variety of sources, such as relational databases. Web services, and multidimensional databases.

A

Enterprise information integration (EII)

123
Q

Which Datawarehouse architecture uses a normalized relational warehouse that feeds multiple data marts

A

hub-and-spoke data warehouse architecture

124
Q

Data warehouse s provide an indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses?

A

improved customer service

125
Q

All of the following are true about in-database processing technology except

A

The potentially useful aspect means that the results should lead to some business benefit

126
Q

The Data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage

127
Q

List 4 possible analytics applications in the retail value chain

A

Inventory, Price Elasticity, Shopper Insight, Store Layout

128
Q

In the dell case study, the largest issue was how to properly spend the online marketing budget

129
Q

The entire focus of the predictive analytics system in the infinity P &C case was on detecting and handing fraudulent claims for the company’s benefit

130
Q

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering

131
Q

Understanding customers better has helped amazon and other become more successful. The understading comes primarily from

A

analyzing the vast data amounts routinely collected.

132
Q

Which of the following is a data mining myth

A

Data mining requires a separate, dedicated database.

133
Q

Nominal data represent the labels of multiple classes used to divide a variable into specific groups

134
Q

Which type of question does visual analytics seek to answer

A

Why did it happen?

135
Q

To respond to its market challenges, Serius XM decidsed to docus on manufacturing efficiency

136
Q

Data is the main ingredient for any BI data science, and business analytics initiative

137
Q

Google maps has set new standards for data visualization with its intuituve web mapping software

138
Q

Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order

139
Q

Traditional BI systems use a large volume of statistic data that has been extracted cleaned and loaded into a data warehouse to produce reports and analyze.

140
Q

Big data often involves a form of distribution storage and processing using Handoop and MapReduce. One reason for this is

A

the processing power needed for the centralized model would overload a single computer.

141
Q

Which is of the following is NOT an example of transaction processing

A

Sales report

142
Q

Data generation is a precursor, and is not included in the analytics ecosystem

143
Q

What type of analytics seeks to determine what is likely to happen in the future.

A

Predictive

144
Q

if using a mining analogy, “knowledge mining” would be a more appropriate term than “data mining.”

145
Q

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales

146
Q

In data mining, classification models help in prediction.

147
Q

Structured data is what data mining algorithms use and can be classified as categorical or numeric

148
Q

Which of the following is LEAST related to data/information visualization?

A

Statistical graphics

149
Q

Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.

150
Q

Dashboards can be presented at all the following levels EXCEPT

A

The visual cube level

151
Q

Descriptive statistics is about describing the sample data on hand

152
Q

Business applications have moved from transaction processing and monitoring to other activities. Which of the following is NOT one of those activities?

A

Data monitoring

153
Q

Managing data warehouses requires special methods, including parallel computing and/or Hadoop/Spark

154
Q

The very design that makes an OLTP system efficient for transaction processing makes it inefficient for

A

end-user ad hoc reports, queries, and analysis.

155
Q

Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is

A

speed of data transfer.

156
Q

Data warehousing administrators(DWA) do not need strong business insight since they only handle the technical aspect of the infrastructure

157
Q

Data warehouses are subsets of data marts

158
Q

Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks

159
Q

Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.

160
Q

Which approach to data warehouse integration focuses more on sharing process functionality than data across systems?

A

Enterprise application integration

161
Q

Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates?

A

Independent data mart

162
Q

A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n)

163
Q

The “islands of data” problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization.

164
Q

Which of the following developments is NOT contributing to facilitating growth of decision support and analytics?

A

Locally concentrated workforces

165
Q

During classification in data mining, a false positive is an instance classified as true by the model while being false in reality.

166
Q

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as

A

association rule mining

167
Q

All of the following statements about data mining are true EXCEPT:

A

The ideas behind it are relatively new

168
Q

Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by

A

removing identifiers such as names and social security numbers.

169
Q

Which data mining process

170
Q

Contextual metadata for a dashboard includes all the following EXCEPT

A

which operating system is running the dashboard server software.

171
Q

What is the management feature of a dashboard?

A

Operational data that is identify what actions to take to resolve a problem

172
Q

Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT

A

they explore massive amounts of data in hours, not days.

173
Q

When you tell a story in a presentation, all of the following are true EXCEPT

A

a well-told story should have no need for subsequent discussed

174
Q

Relational databases began to be used in the:

175
Q

Decision support system (DSS) and management information system (MIS) have precise definitions agreed to by practitioners.

176
Q

Computer applications have moved from transaction processing

177
Q

Describe and define Big Data. Why is a search engine a Big Data application?

A

Data that cannot be stored in a single storage unit. It refers to data that arrives in multiple forms (structured or unstructured, or in a stream) A search engine is a big data application because it requires the user to search up a certain topic / question and in return the web searches and delivers billions of web pages relevant to the users search in a fraction of a second

178
Q

There are several basic information system architectures that can be used for data warehousing. What are they?

A

Some IS architectures that can be used for data warehousing are one, two, and three-tier architectures

179
Q

List 5 reasons for the growing popularity of data mining in the business world

A

Recognize fraud
Identifies rick factors
Can improve customer relationships
Advances in both computer hardware and software
More accessible and affordable

180
Q

List the five most common functions of a business report

A

To ensure that all departments are functioning properly
To provide information
To provide the results of an analysis
To persuade others to act
To create an organizational memory (as part of a knowledge management system)

181
Q

More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for RDW. What is RDW?

A

also known as active data warehousing (ADW), is the process of loading and providing data via the data warehouse as they become available.

182
Q

Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies?

183
Q

Describe the difference between descriptive and inferential statistics

A

Descriptive statistics describe sets of data. Inferential statistics draws conclusions about the sets of data based on sampling

184
Q

A common way of introducing data wharehousing is to refer to its fundamental characteristics. Describe three characteristics of data wharehousing

A

Subject oriented: Data is organized by detailed subject, such as the sales, products , or customers, containing data relevant for decision support.
Integrated: Must place data from different sources into consistent format . To do so they have to deal with various conflicts.
Nonvolatile: After the data is entered into the data warehouse, users cannot change the data or update it. Changes are recorded as new data.

185
Q

In lessons learned from the Target case. What leagal warning would you give another reseller using data mining for marketing

A

If you look at the case you can see that Target didn’t violate any law. Target didn’t use any information that violates customer privacy. They only used transactional data that every other retail store obtains and stores. In terms of legal matters they didn’t do anything wrong.

186
Q

In the Tito’s Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers.

187
Q

Search engine optimization (SEO) is a means by which

A

Web site developers can increase Web site search rankings

188
Q

In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users.

189
Q

What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?

A

small- to medium-sized documents

190
Q

Search engine optimization (SEO) techniques play a minor role in a Web site’s search ranking because only well-written content matters.

191
Q

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.

192
Q

In the car insurance case study, text mining was used to identify auto features that caused injuries

193
Q

________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

A

Propinquity

194
Q

________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

195
Q

Categorization and clustering of documents during text mining differ only in the preselection of categories.

196
Q

Web-based media has nearly identical cost and scale structures as traditional media.

197
Q

Web site usability may be rated poor if

A

Web site visitors download few of your offered PDFs and videos.

198
Q

Companies understand that when their product goes “viral,” the content of the online conversations about their product does not matter, only the volume of conversations.

199
Q

In text mining, tokenizing is the process of

A

categorizing a block of text in a sentence

200
Q

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.

201
Q

IBM’s Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called