wk 8 - business intelligence and big data Flashcards

Question 1

Q

Business intelligence (def)

Answer

A

Business Intelligence (BI) is the process of collecting, cleansing, combining, consolidating, analysing, interpreting and communicating internal and available external data, relevant for the decision making process in the organisation

Question 2

Q

why the interest in business data?

Answer

A

Availability of Data

Amount of digital data growing exponentially
In their 2010 report, Gartner defined a small data warehouse as less than 5 terabytes (TB), a medium data warehouse as 5 TB to 20 TB, and a large data warehouse as greater than 20 TB

Question 3

Q

What’s feeding massive data increase?

Answer

A

Digital channel data, content posted on social media, data collected by smart, connected devices (cf. the Internet of Things), real-time sensor readings, supply chain technologies such as RFID, etc.

Other data is available through “open data” initiatives (e.g. public services making data freely available), Google Maps, real-time data feeds (e.g. stock exchanges), etc.

Question 4

Q

Business trends that drive Business intelligence

Answer

A

Implementation of corporate performance management systems
Compliance with new regulatory frameworks
Importance of Customer Relationship Management (CRM) and one-to-one marketing
Trends such as market globalisation, company mergers, etc.
Digital business, digital marketing, social media, etc.
Idea of the data-driven organisation that competes on analytics

Question 5

Q

Corporate performance management

Answer

A

Also referred to as enterprise performance management (EPM), business performance management (BPM), strategic performance management (SPM), or simply performance management

Refers to a set of management processes, often supported by technology, that involve measuring and monitoring performance in support of better strategic decision making

“Trying to improve something without having a goal, a numerical goal, is like trying to lose weight without having a scale.” (Subir Chowdhury, ‘The Power of Six Sigma’)
Examples: balanced scorecard, six sigma, etc.

Question 6

Q

Customer relationship management

Answer

A

Customer Relationship Management (CRM) systems: designed to help firms manage customer interactions and maximise the customer lifetime value for the firm
“It is 6-7 times more expensive to gain a new customer than retain an existing customer.” (Bain & Co. study, HBR)
Operational CRM: systems supporting customer-facing processes (e.g. sales lead management, call centre & customer service support, etc.)
Analytical CRM: analysis of customer data to provide insights or models to optimise aspects of our customer relationships (e.g. which customer segments to target with retention campaign, cross-selling opportunities, etc.)
One of the key difficulties: having a “single customer view”

Question 7

Q

why Data Warehousing?

Answer

A

Relational databases are typically optimised to support everyday operations (e.g. sales transaction processing), not so much analytics activities
Working on same database may slow system down
Data structure not ideally suited
History typically not systematically kept
Different systems for various functional areas of the company (sales, marketing, customer service, production, etc.) data from all of which may be needed for analysis
Different data encodings/formats, inconsistencies, etc. that have to be reconciled first

Question 8

Q

Data warehousing (def)

Answer

A

How to define a data warehouse
“A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” (Inmon)
“A copy of transaction data specifically structured for query and analysis” (Kimball)

Subject-oriented”
Organised around the major subjects of an enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and order processing)

“Integrated”
Integrates application-oriented data from different source systems, which must be made consistent to present a unified view to the users
Centralised & cross-functional
Common data sources:
Transaction processing systems, relational databases, ERP, etc.
Legacy systems
External data

Question 9

Q

Extraction, Transformation and loading (ETL)

Answer

A

Typically, ETL tools are used to set up and configure an automated system that regularly updates the data warehouse
Extract data from source systems (operational, legacy, etc.)
Identifying which records in the source have changed since the last update
Transform data
Substeps: format  cleanse  aggregate and merge  enrich
Load transformed data into the data warehouse

Question 10

Q

Big data

Answer

A

Increasingly data comes from a multitude of different sources, is often unstructured and unintegrated, and there are ever larger amounts of it – hence the term big data

Question 11

Q

Storing and (processing) big data : Hadoop

Answer

A

Hadoop is an open-source framework that has become popular for distributed storage and parallel processing of massive amounts of data

Distributed file system that can spread the data over a large cluster of (inexpensive) machines

MapReduce – programming model that allows for large batch-processing jobs to be divided into smaller tasks that can be run in parallel

Originally developed by Google to index the exploding volume of content on the web

Hadoop environment may
complement (or replace) a traditional data warehouse

Question 12

Q

Online analytical processing (OLAP)

Answer

A

Online Analytical Processing (OLAP) is defined as:
interactive analysis of
large volumes of data
from multiple dimensions

Originally applied to data from data warehouse or data mart but now also OLAP-on-Hadoop engines emerging

OLAP tools allow you to interactively break down summary measures such as total unit sales, sales revenue, costs, etc. according to various available grouping criteria

Originally data pre-loaded internally in the form of an OLAP “cube” to facilitate analysis; this pre-processing step now increasingly made obsolete by in-memory analytics solutions

Question 13

Q

OALP operations : drilling

Answer

A

Drilling: navigating through a dimension hierarchy to desired level of detail

Drill down: go down the hierarchy or introduce extra dimension (i.e. break down in more detail)
Total sales
Total sales per city
Total sales per city per shop

Drill up or roll up: climb up hierarchy or reduce dimensions (i.e. get measure at more aggregate level)

Drill across: within same dimension select another attribute value
After viewing the results for 2011, change the selection to 2012

Question 14

Q

OALP operations - slicing and dicing

Answer

A

Slicing: take horizontal or vertical cut of cube, i.e. restrict one dimension
Sales data for product X
Sales data for shop A

Dicing: restrict two or more dimensions
Sales data for products X and Y, in shops A and B, during the summer

Question 15

Q

limits of OALP analysis

Answer

A

You could e.g. break down a product’s revenue according to customer demographics but what if there are 10,000s of products to manually investigate?

Also, does not yet produce any prediction for the future!
Example for customer attrition:
Basic reports or dashboards may show that overall churn rate (i.e. the percentage of customers lost) was up in last quarter

OLAP analysis may reveal churn rates were higher for this product group or that segment of customers

Predictive analytics: can we forecast churn rates for the next quarter, or, even better, identify which individual customers have highest probability of churning given the data we have on them?

Question 16

Q

Predictive analytics and data mining

Answer

A

Data mining/analytics: applying computational techniques to find interesting patterns or derive a predictive
model

To a large extent computer-driven discovery of (possibly unexpected) patterns or model building
as opposed to OLAP’s user-driven verification of hypothesised patterns

Typically involves more complex analyses at a deeper level of detail than simply aggregating data

Could involve building predictive models, i.e. which predict future events based on past observations (predictive analytics)

example applications

Credit scoring
Detecting fraud
Insurance claims prediction
Stock market forecasting
Market basket analysis (e.g. beer and diapers story)
Online recommender systems
Customer lifetime value (CLV) modelling, churn prediction, etc.
Targeted marketing: response modelling
Market segmentation

Question 17

Q

Types of analytics tasks

Answer

A

Predictive analytics: use past data to learn to predict some variable of interest (target variable) for an individual or instance, from other observable variables (input variables) – three types of tasks:
Classification
Regression or estimation
Forecasting
Descriptive analytics: identify and describe patterns present in the data (no separate target variable!)
Association analysis
Segmentation / clustering

Question 18

Q

predictive analytics - classification

Answer

A

Classification: use input variables to classify subject into one of two or more predefined target classes (e.g. predict whether individual customer will be good or bad payer, will churn or stay, will respond to campaign or not, etc.)

Example applications:
Use income, residential status, credit history, etc. to classify customer as likely “good” or “bad” payer (credit scoring)

Use recency, frequency, and monetary value of previous purchases, service calls made, etc. to classify customer as likely churner or not

Example models used are decision trees, scorecards, etc., but analysts can choose from range of statistical or machine learning techniques

Question 19

Q

predictive analytics - regression

Answer

A

Regression or estimation: predict value of a continuous (numeric) target variable (e.g. profit in GBP, loss, etc.)

Example applications:
Predict volume of future spending of customer, e.g. based on recency, frequency, and monetary value of previous purchases

Predict size of loss if customer defaults on loan based on customer characteristics, type and value of collateral, etc.

Again first collect sample from previous time period, build model for that sample, then apply to current customers

Example technique: linear regression (provides estimate for target variable using weighted sum of input variables, e.g.: estimated spending = 102 + 3.2 x age + 0.1 x income

Question 20

Q

predictive analytics - forecasting

Answer

A

Forecasting: regression over time-series data
Example applications:
Forecast monthly sales figures, stock prices, total energy consumption, etc., taking into account:
Autocorrelation: next period depends on recent periods
Long-term trend – upward or downward?
Seasonality (e.g. we always sell more in the winter holiday season)
Techniques include various types of time series an

Question 21

Q

descriptive analytics - association analysis

Answer

A

Detect frequently occurring patterns of items in a large transaction database

Association rule example:
If a customer buys bread, then (s)he is also likely to buy milk (based on previously observed transactions by our customers)

Example applications
Market basket analysis (e.g. selecting product coupons for individual customers, cross-selling related products, etc.)

Web analytics (e.g. shortcuts to pages often part of same visit)
Recommender systems (‘you may also like’)
…

Question 22

Q

descriptive analytics - segmentation / clustering

Answer

A

Identify clusters or segments of homogenous subjects (i.e. having similar values for a series of variables)

Example applications
Market segmentation: e.g. identify groups of customers that are similar in terms of demographics, interests, etc.

Can then think of best way to market products to a certain group, products they might be interested in, etc.

Note: no separate target variable that we are trying to predict! (difference with classification)

Question 23

Q

types of big data analytics

Answer

A

Based on type of input data, can distinguish between e.g.:
Text mining
Predicting car insurance fraud based on accident report
Sentiment analysis: is a post, review, etc. positive or negative?

Image processing
e.g., captioning/tagging images based on content they contain

Social network analytics
e.g., a telco making a more accurate prediction of the probability customer X will churn by also considering how many of the customers with whom X regularly calls have recently churned