Chapter 1: Introduction Flashcards
New Sources of Data
- Tweets 12tb
- Facebook 25tb
- Google, youtube …
- RFID
- Smart Meters
- Cameras
- GPS
1 Source of large data
- Customer transactional data –> how do customers behave?
Traditional Data Warehousing
- Several Sources (e.g. online transaction system) –>
- Extractor / Monitor –>
- Integration System (<–> Meta Data) –>
- Data warehouse (Mngmt decision support)
- –> Clients
Volume, Velocity, and Variety
- Volume: Enterprises are awash with ever-growing data of all types.
- Turn 12 terabytes of Tweets each day into improved product sentiment analysis
- Convert 350 billion annual meter readings to better predict power consumption
- Velocity: For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
- Scrutinize 5 million trade events created each day to identify potential fraud
- Analyze 500 million daily call detail records in real-time to predict customer churn
faster
- Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more.
- Monitor 100’s of live video feeds from surveillance cameras to target points of interest
- Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Aggregating Data from Different Sources
The challenge for most organizations is to manage and analyze the various sources of structured, structured, and streaming data.
- Websites
- Billing, ERP, CRM
- RFID
- Network switches
- Social media
New Trends in Data Organization
- Main memory databases are able to run queries in seconds (which took hours!)
- Distributed file systems allow for effective parallelization (e.g., Apache Hadoop)
Business Analytics (Definition)
Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and fact-based management to drive decision making. It is therefore closely related to management science. Analytics may be used as input for human decisions or may drive fully automated decisions.
Descriptive Analytics
What has occurred?
How much did I sell?
BI, Data engineering, statistics …
Data Engineering and Statistics:
Organize data, execute large queries, describe means, trends, and test hypotheses
Predictive Analytics
What will occur?
Try to understand behaviour. E.g. switching customers
Data Mining and Econometrics
Forecast events, predict time series, or discrete choice decisions of customers
Prescriptive Analytics
What should occur?
Network flow, Management science …
Algorithms and Optimization
Develop algorithms and optimization models for planning, scheduling,
pricing, and revenue mgt.
Relationship to Business Intelligence (BA related to predictive / inductive statistics and BI related to descriptive analytics / statistics)
- Business analytics (related to predictive analytics / inductive statistics)
- focuses on developing new insights and understanding of business
performance based on data and statistical methods.
* may be used as input for human decisions or may drive fully automated decisions. * Business intelligence (related to descriptive analytics / statistics) * traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods. * is often associated with querying, reporting, OLAP, and "alerts".
From Data to Information (Flow)
- Data consolidation (Data input and Querys) –> DWH
- Selection and processing (make sense out of large table)
- Business analytics (model that fits data)
- Interpretation and evaluation (insights)
Predictive Analytics
- Algorithms and Databases
- Association Rule Algorithms
- Algorithm Design Techniques
- Algorithm Analysis
- Statistics and Econometrics
- Statistics and Econometrics
- Bayes Theorem
- Regression Analysis
- EM Algorithm
- Clustering
- Time Series Analysis
- Machine Learning and Data Mining
- Decision Tree and other Classification Algorithms
- Clustering
- Neural Networks
Numerical prediction
Given a collection of data with known numeric outputs, create a function that outputs a predicted value from a new set of inputs.
E.g. Given gestation time of an animal, predict its maximum life span.
Classification
- From data with known labels, create a classifier that determines which label to apply to a new observation
- E.g. Identify new loan applicants as low, medium, or high risk based on existing applicant behavior.