Ch 1 Big data Flashcards
big data
analysis, processing, and storage of large collections of
data that frequently originate from disparate sources.
datasets
Collections or groups of related data are generally referred to as datasets
data analysis and its goal
Data analysis is the process of examining data to find facts, relationships, patterns,
insights and/or trends. The overall goal of data analysis is to support better decisionmaking.
Data analytics
Data analytics is a
discipline that includes the management of the complete data lifecycle, which
encompasses collecting, cleansing, organizing, storing, analyzing and governing data.
four general categories of analytics
descriptive analytics
• diagnostic analytics
• predictive analytics
• prescriptive analytics
Descriptive Analytics
Descriptive analytics are carried out to answer questions about events that have already
occurred. This form of analytics contextualizes data to generate information.
Diagnostic Analytics
Diagnostic analytics aim to determine the cause of a phenomenon that occurred in the past
using questions that focus on the reason behind the event. The goal of this type of
analytics is to determine what information is related to the phenomenon in order to enable
answering questions that seek to determine why something has occurred.
Predictive Analytics
Predictive analytics are carried out in an attempt to determine the outcome of an event that
might occur in the future.
Prescriptive Analytics
Prescriptive analytics build upon the results of predictive analytics by prescribing actions
that should be taken.
Business Intelligence (BI)
BI enables an organization to gain insight into the performance of an enterprise by
analyzing data generated by its business processes and information systems.
Key Performance Indicators (KPI)
A KPI is a metric that can be used to gauge success within a particular business context.
KPIs are linked with an enterprise’s overall strategic goals and objectives.
Big Data Characteristics
- volume
- velocity
- variety
- veracity
- value
Velocity
From an enterprise’s point of view, the
velocity of data translates into the amount of time it takes for the data to be processed once
it enters the enterprise’s perimeter.
Variety
Data variety refers to the multiple formats and types of data that need to be supported by
Big Data solutions.
Veracity
Veracity refers to the quality or fidelity of data. Data that enters Big Data environments
needs to be assessed for quality, which can lead to data processing activities to resolve
invalid data and remove noise
Noise in data
Noise is data that cannot be converted into information and thus has no value,
signals in data
whereas signals have value and lead to meaningful information.
Value, and its dependencies
Value is defined as the usefulness of data for an enterprise. The value characteristic is
intuitively related to the veracity characteristic in that the higher the data fidelity, the more
value it holds for the business. Value is also dependent on how long data processing takes
because analytics results have a shelf-life
Human-generated data
Human-generated data is the result of human interaction with systems, such as online
services and digital devices.
Machine-generated data
Machine-generated data is generated by software programs and hardware devices in
response to real-world events.
Structured Data
Structured data conforms to a data model or schema and is often stored in tabular form and stored in a relational database
Unstructured Data
Data that does not conform to a data model or data schema is known as unstructured data.
It is estimated that unstructured data makes up 80% of the data within any given
enterprise
Semi-structured Data
Semi-structured data has a defined level of structure and consistency, but is not relational
in nature. Instead, semi-structured data is hierarchical or graph-based.
Metadata
Metadata provides information about a dataset’s characteristics and structure. This type of
data is mostly machine-generated and can be appended to data.
The first and most important step in any data analysis project is
The first and most important step in any data analysis project is to establish a clear goal, not a goal
defined only by the data or the method, but a goal that makes sense to the business as a whole. In
Descriptive analysis
technique that allows you to view and measure your company and
customer characteristics.
Customer Profile
snapshot of exactly who is buying your products or
services.
Market penetration analysis and wallet share analysis
are techniques for measuring the
performance of your customer base in comparison with the performance of the overall market for your industry
response mode
typically the first type of target model that a company seeks to develop.
win-back model
A win-back model is used to invite former customers to reconsider their relationship to the business
activation model
An activation model predicts whether a prospect will become a customer
revenue model
predicts the dollar amount of an expected sale
usage model
predicts the amount of use given to a product or service
cross-sell model
cross-sell model is used to predict the probability or value of a current customer’s buying a different product or service from the same company.
up-sell model
An up-sell model predicts the probability or
value of a customer’s buying more of the same product or service
Among three drugs, which one provides the best results?
Prescriptive Analytics
When is the best time to trade a particular stock?
Prescriptive Analytics
What are the chances that a customer will default on a loan if they have missed a
monthly payment?
Predictive Analytics
What will be the patient survival rate if Drug B is administered instead of Drug A?
Predictive Analytics
If a customer has purchased Products A and B, what are the chances that they will
also purchase Product C?
Predictive Analytics
Why were Q2 sales less than Q1 sales?
Diagnostic Analytics
Why have there been more support calls originating from the Eastern region than
from the Western region?
Diagnostic Analytics
Why was there an increase in patient re-admission rates over the past three months?
Diagnostic Analytics
What was the sales volume over the past 12 months?
Descriptive analytics
What is the number of support calls received as categorized by severity and
geographic location?
Descriptive analytics
What is the monthly commission earned by each sales agent?
Descriptive analytics