CHP. 8 Flashcards

1
Q

Information overload

A

An overabundance of irrelevant data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dirty data

A

Problematic data. Examples are a value of B for customer gender and a value of 213 for customer age. Other examples are a value of 999-999-9999 for a North American phone number, a part colour of green, and an email address of WhyMe@GuessWhoIAM.org. All these values are problematic when data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Granularity

A

The level of detail in data. Customer name and account balance are large granularity data. Customer name, balance, and the order details and payment history of every customer order are smaller granularity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clickstream data

A

E-commerce data that describe a customer’s clicking behaviour. Such data include everything the customer does at the website.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Online transaction processing (OLTP)

A

Collecting data electronically and processing transactions online.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Online Analytic Processing or (OLAP) and Decision support systems (DSSs)

A

Systems that focus on making data collected in OLTP useful for decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Online analytic processing (OLAP)

A

A dynamic type of reporting system that provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. Such reports are dynamic because users can change the format of the reports while viewing them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Drill down

A

With an OLAP report, to further divide the data into more detail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data resource challenge

A

Occurs when data are collected in OLTP but are not used to improve decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Business intelligence (BI) system

A

A system that provides the right information, to the right user, at the right time. A tool produces the information, but the system ensures that the right information is delivered to the right user at the right time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Group decision support systems (GDSSs)

A

An application that enables more than one individual to undertake a decision. Often includes voting and brainstorming functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reporting systems

A

Systems that create information from disparate data sources and deliver that information to the proper users on a timely basis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data-mining system

A

Information system that processes data using sophisticated statistical techniques, such as regression analysis and decision-tree analysis, to find patterns and relationships that cannot be found by simpler operations, such as sorting, grouping, and averaging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Market-basket analysis

A

A data-mining technique for determining sales patterns. A market-basket analysis shows the products that customers tend to buy together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Knowledge management (KM) systems

A

Information systems for storing and retrieving organizational knowledge, whether that knowledge is in the form of data, documents, or employee know-how.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Expert systems

A

Knowledge-sharing systems that are created by interviewing experts in a given business domain and codifying the rules used by those experts.

17
Q

RFM analysis

A

A way of analyzing and ranking customers according to their purchasing patterns.

considers:

-How Recently a customer has ordered
-How Frequently a customer orders
- How much money the customer spends per order

18
Q

Data warehouse

A

Is used to extract and clean data from operational systems and other sources for reporting and data mining.

19
Q

Metadata

A

Data that describe data.

20
Q

Data mart

A

A database that prepares, stores, and manages data for reporting and data mining for specific business functions.

21
Q

Data mining

A

The application of statistical techniques to find patterns and relationships among data and to classify and predict.

22
Q

Unsupervised data mining

A

A form of data mining whereby the analysts do not create a model or hypothesis before running the analysis. Instead, they apply the data-mining technique to the data and observe the results. With this method, analysts create hypotheses after the analysis to explain the patterns found.

23
Q

Cluster analysis

A

An unsupervised data-mining technique whereby statistical techniques are used to identify groups of entities that have similar characteristics.

A common use for cluster analysis is to find groups of similar customers in data about customer orders and customer demographics.

24
Q

Supervised data mining

A

A form of data mining in which data miners develop a model prior to the analysis and apply statistical techniques to data to estimate values of the parameters of the model.

25
Q

Regression analysis

A

A type of supervised data mining that estimates the values of parameters in a linear equation. Used to determine the relative influence of variables on an outcome and also to predict future values of that outcome.

26
Q

Neural networks

A

A popular supervised data-mining technique used to predict values and make classifications, such as good prospect or poor prospect.

27
Q

Lift

A

In market-basket terminology, the ratio of confidence to the base probability of buying an item. Lift shows how much the base probability changes when other products are purchased. If the lift is greater than 1, the change is positive; if it is less than 1, the change is negative.

28
Q

Big Data

A

An imprecise term that generally refers to large volumes of a variety of data over a long period of time that are used to draw general and specific inferences and analysis—for example the spread of disease, customer preferences, or individual behaviors.