Ch 1 Big data Flashcards
big data
analysis, processing, and storage of large collections of
data that frequently originate from disparate sources.
datasets
Collections or groups of related data are generally referred to as datasets
data analysis and its goal
Data analysis is the process of examining data to find facts, relationships, patterns,
insights and/or trends. The overall goal of data analysis is to support better decisionmaking.
Data analytics
Data analytics is a
discipline that includes the management of the complete data lifecycle, which
encompasses collecting, cleansing, organizing, storing, analyzing and governing data.
four general categories of analytics
descriptive analytics
• diagnostic analytics
• predictive analytics
• prescriptive analytics
Descriptive Analytics
Descriptive analytics are carried out to answer questions about events that have already
occurred. This form of analytics contextualizes data to generate information.
Diagnostic Analytics
Diagnostic analytics aim to determine the cause of a phenomenon that occurred in the past
using questions that focus on the reason behind the event. The goal of this type of
analytics is to determine what information is related to the phenomenon in order to enable
answering questions that seek to determine why something has occurred.
Predictive Analytics
Predictive analytics are carried out in an attempt to determine the outcome of an event that
might occur in the future.
Prescriptive Analytics
Prescriptive analytics build upon the results of predictive analytics by prescribing actions
that should be taken.
Business Intelligence (BI)
BI enables an organization to gain insight into the performance of an enterprise by
analyzing data generated by its business processes and information systems.
Key Performance Indicators (KPI)
A KPI is a metric that can be used to gauge success within a particular business context.
KPIs are linked with an enterprise’s overall strategic goals and objectives.
Big Data Characteristics
- volume
- velocity
- variety
- veracity
- value
Velocity
From an enterprise’s point of view, the
velocity of data translates into the amount of time it takes for the data to be processed once
it enters the enterprise’s perimeter.
Variety
Data variety refers to the multiple formats and types of data that need to be supported by
Big Data solutions.
Veracity
Veracity refers to the quality or fidelity of data. Data that enters Big Data environments
needs to be assessed for quality, which can lead to data processing activities to resolve
invalid data and remove noise
Noise in data
Noise is data that cannot be converted into information and thus has no value,
signals in data
whereas signals have value and lead to meaningful information.
Value, and its dependencies
Value is defined as the usefulness of data for an enterprise. The value characteristic is
intuitively related to the veracity characteristic in that the higher the data fidelity, the more
value it holds for the business. Value is also dependent on how long data processing takes
because analytics results have a shelf-life