Chpt 3 Flashcards

1
Q

Algorithm

A

We’ll use machine learning to automatically classify email as either spam or legitimate email as described by Paul Graham. In order to do so, we’ll need to choose an algorithm, or a set of procedures used to solve a mathematical problem, that best fits our situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Artificial intelligence (AI)

A

the ability of a machine to simulate human abilities such as vision, communication, recognition, learning, and decision making in order to achieve a goal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Automation

A

Organizations hope to use AI to increase the automation, or the process of making systems operate without human intervention, of mundane tasks typically done by humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

BI analysis

A

the process of creating business intelligence. The three fundamental categories of BI analysis are reporting, data mining, and Big Data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

BI application

A

The software component of a BI system is called a BI application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Big Data

A

Is a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

BI server

A

is a Web server application that is purpose-built for the publishing of business intelligence. The Microsoft SQL Server Report manager is the most popular such product today

BI servers provide two major functions: management and delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Business intelligence (BI)

A

patterns, relationships, trends, and predictions are referred to as business intelligence. As information systems, BI systems have the five standard components: hardware, software, data, procedures, and people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Business intelligence systems

A

are information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals and other knowledge workers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cluster analysis

A

Unsupervised data mining using statistical techniques to identify groups of entities that have similar characteristics. A common use for cluster analysis is to find groups of similar customers in data about customer orders and customer demographics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Content management systems (CMS)

A

Information systems that support the management and delivery of documentation including reports, Web pages, and other expressions of employee knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Continuous intelligence

A

uses machine learning to analyze real-time data and automatically make business decisions. Businesses can use continuous intelligence to make better decisions because they can evaluate all possible alternatives and apply business rules in a fraction of a second. Transportation, shipping, retail, accommodation, and manufacturing companies would all gain significant competitive advantages if they were able to automate decision making based on real-time data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Corpus of knowledge

A

a large set of related data and texts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data acquisition

A

In business intelligence systems, the process of obtaining, cleaning, organizing, relating, and cataloging source data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data aggregator

A

or company that gathers and sells information from multiple sources, may not be compatible with internal operational data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data discovery

A

Processes that allow users to visually analyze and explore data in a user-friendly way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data lake

A

is a central repository for large amounts of raw unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How are data lakes and warehouses different?

A

Data lakes can contain more types of data than a data warehouse, and it can store them in their raw unstructured forms. Data lakes can also store real-time data from smart devices, websites, and mobile applications. Data lakes are useful for storing large amounts of data to be later used by data scientists in machine learning and deep learning (discussed later in this chapter). Analysis of data from data lakes can provide new insights that can’t be found in traditional data warehouses that are traditionally focused on reporting, trends, and answering operational questions.

Data lakes also have their own set of unique problems. If data in a data lake are not managed and cataloged correctly, data may become inadvertently hidden over time. A company’s data lake may become a data swamp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data swamp

A

stores large amounts of data that may never be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data mart

A

data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data mining

A

the application of statistical techniques to find patterns and relationships among data for classification and prediction. As shown in Figure 3-18, data mining resulted from a convergence of disciplines, including artificial intelligence and machine learning.

Data mining techniques fall into two broad categories: unsupervised and supervised. We explain both types in the following sections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data visualization

A

graphical representation of data, allows users to quickly understand complex data. Data discovery tools, like data visualization, are increasing in popularity because of their usefulness. However, data discovery tools may miss meaningful patterns or correlations that would be found by data mining techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data warehouse

A

Larger organizations, however, typically create and staff a group of people who manage and run a data warehouse, which is a facility for managing an organization’s BI data. The functions of a data warehouse are to:

Obtain data
Cleanse data
Organize and relate data
Catalog data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Decision support systems

A

some authors define BI systems as supporting decision making only, in which case they use the older term decision support systems as a synonym for decision-making BI systems.

25
Q

Deep learning

A

method for stimulating multiple layers of neural networks rather than just a single layer. AI

26
Q

Dimension

A

is a characteristic of a measure. Purchase date, customer type, customer location, and sales region are all examples of dimensions.

27
Q

Drill down

A

With an OLAP report, it is possible to drill down into the data. This term means to further divide the data into more detail. In Figure 3-17, for example, the user has drilled down into the stores located in California; the OLAP report now shows sales data for the four cities in California that have stores.

28
Q

Dynamic reports

A

BI documents that are updated at the time they are requested. A sales report that is current at the time the user accessed it on a Web server is a dynamic report. In almost all cases, publishing a dynamic report requires the BI application to access a database or other data source at the time the report is delivered to the user.

29
Q

Exception reports

A

Another type of report, exception reports are produced when something out of predefined bounds occurs. For example, a hospital might want an exception report showing which doctors are prescribing more than twice the amount of pain medications than the average doctor. This could help the hospital reduce the potential for patient addiction to pain medications.

30
Q

Granularity

A

Data can also have the wrong granularity, a term that refers to the level of detail represented by the data. Granularity can be too fine or too coarse. Better to be too fine!

31
Q

Hadoop

A

an open source program supported by the Apache Foundation6 that implements MapReduce on potentially thousands of computers. Hadoop could drive the process of finding and counting the Google search terms, but Google uses its own proprietary version of MapReduce to do so instead. Some companies implement Hadoop on server farms they manage themselves, and others, as you’ll read more about in Chapter 6, run Hadoop in the cloud.

32
Q

Knowledge management (KM)

A

is the process of creating value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need that capital. The goal of knowledge management is to prevent employees not knowing what to do, while someone else does.

33
Q

Machine learning

A

subset of AI is machine learning, or the extraction of knowledge from data based on algorithms created from training data. Essentially, machine learning is focused on predicting outcomes based on previously known training data.

34
Q

MapReduce

A

Because Big Data is huge, fast, and varied, it cannot be processed using traditional techniques. MapReduce is a technique for harnessing the power of thousands of computers working in parallel to analyze different parts of big data.

35
Q

Master data management

A

Data acquired from different sources is made uniform and consistent through a process called master data management. Master data management is necessary because data from one source may not be consistently formatted with data from another source

36
Q

Measure

A

An OLAP report has measures and dimensions. A measure is the data item of interest. It is the item that is to be summed or averaged or otherwise processed in the OLAP report. Total sales, average sales, and average cost are examples of measures

37
Q

Naïve Bayes Classifier

A

that predicts the probability of a certain outcome based on prior occurrences of related events. In other words, we’re going to try to predict whether a new email is spam or not based on attributes of previous spam messages.

38
Q

Natural language processing (NLP)

A

IBM’s artificial intelligence named Watson is a question answering system that draws on several areas of AI. First, it uses natural language processing (NLP), or the ability of a computer system to understand spoken human language, to answer questions.

39
Q

Neural network

A

a computing system modeled after the human brain that is used to predict values and make classifications.

40
Q

OLAP cube and an OLAP report are the same thing.

A

Ex: Excel sheet for total sales where you have different store locations

second type of reporting application, is more generic than RFM. OLAP provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. The defining characteristic of OLAP reports is that they are dynamic. The viewer of the report can change the report’s format–hence the term online.

41
Q

Hadoop includes a query language titled

A

Pig

42
Q

Publish results

A

is the process of delivering business intelligence to the knowledge workers who need it

43
Q

Push publishing

A

delivers business intelligence to users without any request from the users; the BI results are delivered according to a schedule or as a result of an event or particular data condition.

44
Q

Pull publishing

A

requires the user to request BI results. Publishing media include print as well as online content delivered via Web servers, specialized Web servers known as report servers, automated applications, knowledge management systems, and content management systems.

45
Q

Regression analysis

A

A type of supervised data mining that estimates the values of parameters in a linear equation. Used to determine the relative influence of variables on an outcome and also to predict future values of that outcome.

46
Q

Reporting analysis

A

the process of sorting, grouping, summing, filtering, and formatting structured data.

47
Q

Reporting application

A

a BI application that inputs data from one or more sources and applies reporting processes to that data to produce business intelligence.

48
Q

RFM analysis

A

technique readily implemented with basic reporting operations, is used to analyze and rank customers according to their purchasing patterns.5 RFM considers how recently (R) a customer has ordered, how frequently (F) a customer ordered, and how much money (M) the customer has spent.

49
Q

Static reports

A

are BI documents that are fixed at the time of creation and do not change. A printed sales analysis is an example of a static report. In the BI context, most static reports are published as PDF documents.

50
Q

Strong AI or artificial general intelligence

A

that can complete all of the same tasks a human can. This includes the ability to process natural language; to sense, learn, and interact with the physical world; to represent knowledge; to reason; and to plan. Most AI researchers believe we will have strong AI capabilities sometime around 2040

51
Q

Structured data

A

data in the form of rows and columns. Most of the time structured data means tables in a relational database, but it can refer to spreadsheet data as well.

52
Q

Subscriptions

A

A BI server extends alert/RSS functionality to support user subscriptions, which are user requests for particular BI results on a particular schedule or in response to particular events. For example, a user can subscribe to a daily sales report, requesting that it be delivered each morning

53
Q

Super intelligence

A

capable of intelligence more advanced than human intelligence. Some researchers see superintelligence as a potential threat to humans. Others disagree and argue that this level of AI is hundreds of years away.

54
Q

Supervised data mining

A

With supervised data mining, data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model. For example, suppose marketing experts in a communications company believe that cell phone usage on weekends is determined by the age of the customer and the number of months the customer has had the cell phone account. A data mining analyst would then run an analysis that estimates the effect of customer and account age.

55
Q

The singularity

A

Ray Kurzweil developed a concept he calls the Singularity, which is the point at which an AI becomes sophisticated enough that it can adapt and create its own software and, hence, adapt its behavior without human assistance. Apply this idea to unsupervised data mining.26 What happens when machines can direct their own data mining activities? There will be an accelerating positive feedback loop among AIs. A single AI will have more processing power than all possible human cognitive power combined. We may even have the technology to merge human intelligence with AIs and gain knowledge that we could never have comprehended before. Kurzweil predicts this could happen by 2045.

56
Q

Turing test

A

An early computer scientist named Alan Turing said a machine could be considered intelligent if a human could have a conversation with it and not be able to tell if it was a machine or a human.

57
Q

Unsupervised data mining

A

analysts do not create a model or hypothesis before running the analysis. Instead, they apply a data mining application to the data and observe the results. With this method, analysts create hypotheses after the analysis, in order to explain the patterns found.

58
Q

Weak AI

A

Currently, we have weak AI that is focused on completing a single specific task.