Chpt 3 Flashcards
Algorithm
We’ll use machine learning to automatically classify email as either spam or legitimate email as described by Paul Graham. In order to do so, we’ll need to choose an algorithm, or a set of procedures used to solve a mathematical problem, that best fits our situation.
Artificial intelligence (AI)
the ability of a machine to simulate human abilities such as vision, communication, recognition, learning, and decision making in order to achieve a goal.
Automation
Organizations hope to use AI to increase the automation, or the process of making systems operate without human intervention, of mundane tasks typically done by humans.
BI analysis
the process of creating business intelligence. The three fundamental categories of BI analysis are reporting, data mining, and Big Data.
BI application
The software component of a BI system is called a BI application.
Big Data
Is a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety
BI server
is a Web server application that is purpose-built for the publishing of business intelligence. The Microsoft SQL Server Report manager is the most popular such product today
BI servers provide two major functions: management and delivery.
Business intelligence (BI)
patterns, relationships, trends, and predictions are referred to as business intelligence. As information systems, BI systems have the five standard components: hardware, software, data, procedures, and people.
Business intelligence systems
are information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals and other knowledge workers.
Cluster analysis
Unsupervised data mining using statistical techniques to identify groups of entities that have similar characteristics. A common use for cluster analysis is to find groups of similar customers in data about customer orders and customer demographics.
Content management systems (CMS)
Information systems that support the management and delivery of documentation including reports, Web pages, and other expressions of employee knowledge.
Continuous intelligence
uses machine learning to analyze real-time data and automatically make business decisions. Businesses can use continuous intelligence to make better decisions because they can evaluate all possible alternatives and apply business rules in a fraction of a second. Transportation, shipping, retail, accommodation, and manufacturing companies would all gain significant competitive advantages if they were able to automate decision making based on real-time data.
Corpus of knowledge
a large set of related data and texts.
Data acquisition
In business intelligence systems, the process of obtaining, cleaning, organizing, relating, and cataloging source data.
Data aggregator
or company that gathers and sells information from multiple sources, may not be compatible with internal operational data.
Data discovery
Processes that allow users to visually analyze and explore data in a user-friendly way.
Data lake
is a central repository for large amounts of raw unstructured data.
How are data lakes and warehouses different?
Data lakes can contain more types of data than a data warehouse, and it can store them in their raw unstructured forms. Data lakes can also store real-time data from smart devices, websites, and mobile applications. Data lakes are useful for storing large amounts of data to be later used by data scientists in machine learning and deep learning (discussed later in this chapter). Analysis of data from data lakes can provide new insights that can’t be found in traditional data warehouses that are traditionally focused on reporting, trends, and answering operational questions.
Data lakes also have their own set of unique problems. If data in a data lake are not managed and cataloged correctly, data may become inadvertently hidden over time. A company’s data lake may become a data swamp.
Data swamp
stores large amounts of data that may never be used.
Data mart
data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business
Data mining
the application of statistical techniques to find patterns and relationships among data for classification and prediction. As shown in Figure 3-18, data mining resulted from a convergence of disciplines, including artificial intelligence and machine learning.
Data mining techniques fall into two broad categories: unsupervised and supervised. We explain both types in the following sections.
Data visualization
graphical representation of data, allows users to quickly understand complex data. Data discovery tools, like data visualization, are increasing in popularity because of their usefulness. However, data discovery tools may miss meaningful patterns or correlations that would be found by data mining techniques.
Data warehouse
Larger organizations, however, typically create and staff a group of people who manage and run a data warehouse, which is a facility for managing an organization’s BI data. The functions of a data warehouse are to:
Obtain data
Cleanse data
Organize and relate data
Catalog data