Scavenger Hunt 8 Flashcards
What is the rate of data growth?
Doubling every 6 months…unprecedented and will continue regardless of budget restraints
What is “information overload”? What is its alleged impact?
Vast flood of info is disruptive, liking drinking from a fire hydrant
People’s cant do their jobs with so much data
By contrast, what is “information abundance” and what are its implications for knowledge workers?
Take advantage of the presence of all the new data to learn how to make meaningful info from it and gain managerial knowledge
Geek up!
Business Intelligence
Combining aspects of reporting, data exploration & ad hoc queries, & sophisticated data modeling & analysis
Analytics
Statistical and quantitative analysis of data
Explanatory and predictive models, and fact-based management to drive decisions and actions
Data
Raw facts and figures
Tells you nothing alone
Information
Data that has been presented in such a way that it answers questions or supports decision making
Knowledge
Insight derived from experience and enterprise-savvy information
Structured Data
Organized data that conforms to a model so that it can be searched, analyzed, and queried using traditional analysis tools
Unstructured Data
Not organized, no schema
Ex. Text, email, Facebook pages, news stories
Binary
Table
Organized collection of data made up of records and fields
Record
Part of table
Row of data
Individual observation
Fields
Part of table
Column
Attribute for data (fixed schema-textual data)
Relational Database
A database that correlates data from multiple tables
Relational Database Benefits
Combines and simplifies data
Relational Database Key Field
One of the fields in a table marked with a key so that the data items are unique for that row
Never repeat
Relational Database Valid Relationship Types
One:One -> Exactly one occurrence in the key field and Table B has only 1 occurrence as well
One:Many -> One occurrence in the key field but many occurrences in other tables
Relational Database Views
Display data from multiple tables relationships by combining them for reporting and display
SQL
Structured Query Language
Most common language for creating & manipulating databases
Ruling champion database in business world
TPS
Transaction Processing Systems
Ex. ATM, retail sales transactions, websites, searches
(TPS) What is a “transaction?” What are its two key characteristics?
Any business exchange
1. Standardized-schema
2. Occurs repeatedly
(TPS) Point of Sale system
Retail computer systems that collect sales data and are hooked directly into the store’s inventory-control system
Scan barcode, transaction happens
(TPS) How do loyalty cards generate valuable data?
Membership program in which company is paying you through bonuses for data about you that you otherwise would not give them
Enterprise Software
CRM, ERP, SCM
Applications that address the needs of multiple users throughout an organization or work group
CRM
Customer Relation Management System
Every sales call, every customer inquiry, every follow up call=data
ERP
Enterprise Resource Planning System
Paychecks, invoices, payments=business transactions/data to seek insights
SCM
Supply Chain Management System
Each order for finished goods/raw materials=transactions
Business operations — examples
Health care patient data
Michigan tags cows at birth which gives lifetime stream of data for each cow raised in the state
Transportation industry: Plane engine produces 10tb of data every 30 min (sensors on aircrafts)
Swiss Rails: 100 data items a second (sensors on train & track)
Sources of customer-provided data
Customer surveys: customer insight on products/services they received
Product registration cards: data about customer income, where they live, highest education, hobbies
Contests: cheap data where thousands of people apply, giving the company new data from all the entries
Data aggregator
A company whose sole job is to collect data from a wide variety of sources and organize it, clean it, and connect it to each other and then sell access to it to others
Data silos
Data collections completely separated with no possibility of communication or sharing
How do data silos come into being?
Company may have some data trapped inside of obsolete legacy systems
Why are data silos a problem?
Incompatible systems make it so there’s missed opportunities to see patterns, trends, correlations, and develop new insights to answer questions and make decisions
How do inconsistent data formats impact a business?
Makes it hard to sort data
Operational data
Data that is produced by your organization’s day to day operations Things like customer, inventory, and purchase data
How does the analysis of operational data compete with customers?
Delays and lost sales due to significant amount of additional load to the system during business hours (best if we do not query operational data)
What can a company do about the operational data vs customers problem?
Separate data repository:
- One for operational data
- One for reporting and analytics
Combine data from many sources and cleaning it
Historical data builds as months and days go by; used as a resource to see trends
Periodic import from operational systems allows analytical system to be up to date enough to come up with inferences
What is a Data Warehouse? What are its characteristics
Collection of databases that supports decision making
- Many sources
- Operational systems-periodic transfer
- Historical data
- Fast Queries
- Exploration
How is a Data Mart different from a Data Warehouse?
Same thing, different scale
Data Mart looks at specific problem/unit rather than the entire enterprise
What three characteristics are necessary for something to be “Big Data”? (three V’s) Explain each
Volume:
notion data is “too big” to be analyzed with traditional methods (hundreds and millions of data items)
Velocity:
Rapid arrival & feedback loop. Data is too fast. Cannot react fast enough.
Variety:
text, images, sound, video, human input, sensors, servers, so many types of data
What is Hadoop?
open source system designed to be able to consume ANY data you want (structured, unstructured, all data types)
Distributing computing platform
Scalable, cost effective, flexible, fault-tolerant
Examples of big data - How do you see the Three V’s in each?
Predictive Policing in LA
(historical crime data)
Tesco grocery chain
(optimized fridge costs with in store fridges providing 70M data points per store per year; proactive maintenance reduced maintenance costs by ID’ing problems before they happen)
Actions speak louder than words
(Veteran Therapy; military suicide prevention in the US; uses pattern recognition to identify signs and types of psychological distresses through video measurements)
Canned Reports + Pros & Cons
Reports that provide regular summaries of information in a predetermined format
Answer specific questions
Pros: Easy & useful
Cons: Inflexible & IT overhead
Ad-Hoc Reporting Tools + Pros & Cons
Tools that put users in control so that they can create custom reports on an as-needed basis by selecting fields, ranges, summary conditions, and other parameters
Pros: Users define their own resorts, Powerful/flexible
Cons: Demanding of user, Potentially steep learning curve, Business knowledge, Understand data schema
Dashboards
Graphic view of what is happening inside the software system
Some customization
A picture is worth a thousand words
OLAP + Pros & Cons
Online analytical processing
The manipulation of information to create business intelligence in support of strategic decision making
used for enormous amounts of data
Pros: Huge data, Pre-processed + Summarized, User reports fast
Cons: No access to details; user only sees summary
Data Mining
The process of analyzing data to extract information not offered by the raw data alone
Enormous historical datasets
Identify patterns
Build Models
Predict Future