8/28 Chapter 1 Flashcards
Business Intelligence
a set of technologies and processes that use data to understan, analyze, and improve business performance
BI Levels
Access and reporting
Analytics
BI 1. Access and reporting
Ex: key performance indicator(KPI) Corporate war room
Enabling Technologies: data warehouse
BI 2. Analytics
Target marketing, Recommender systems
Enabling Technologies: data mining
BI Enablers
Organizations have accumulated huge amounts of data due to the extensive use of IT for years
Rapid advancement of data processing capabilities of modern computers and DBMS
Studies
MIT study, companies who use data driven decision making are 5% more productive and 6% more profitable
Why database
it makes sense, excel is too confusing when you have large amounts of data
Why Data Warehouse
Knowledge Management Problems(drowning in data, starving for knowled)
- Can’t access data (easily)
- Give me only what’s important(knowledge)
- I need to reduce data to what’s important by slicing and dicing
- Data inconsisteny and poor data utility
- Need to improve the practice of making informed decisions
- Hard and slow to query the database
Can’t access data (easily) why?
Isolated databases distributed in an enterprise
Give me only what’s important
historical data is archived in offline storage systems
Cause 3 database is designed to process transactions but not to answer decision support queries
Complex queries
bad query performance
Solution: in data warehouse, organize data in subject - oriented way rather than process-oriented way-dimensional modeling
Data warehouse
a subject oriented, integrated, time-variant, non-volatile colleciton of data in support of management’s decision making process
Subject Oriented
means the data warehouse focues on high level entities of business such as sales, products, and customers. This is a in contrast to database systems, which deals with processes such as placing an order
Integrated
data is integrated from distributed data sources and historical data sources and stored in a consistent forma
Time-variant
means the data associates with a point in time
Non-volatile
means the data doesn’t change once it gets into the warehouse
Data warehouse data does it change
once it is in the warehouse it doesn’t change
Data warehouse
purpose: Decision Support data organization: subject oriented Data model: Dimensional modeling Time span: historical and current data Query processing: scan a substantial subset of data Operation: Read-Only
Data base
Purpose: Transaction Processing Data Organziation: Process Oriented Data Model: ER Modeling Time Span: Current Data Query Processing: Scan a small set of data Operation: Read & update
Database purpose is geared towards
operation
Data warehouse
is for decision support
Exams:
two in class on Sep 25 and Oct 30
conceptual and problem solving
is primarily from the lecture
Lecture 2 Planning and Requirements Analysis
learn
Data Warehouse Architecture
Operational Source System, Data Staging Area, Data Warehouse, End User Data Analysis
Operational Source System
can be anywhere, we extract from this
Data Staging Area
area where we Transform
Transformation
Clean; Combine; Remove duplicates;Transform
Data Warehouse process
We extract, load, and feed data to users
Data Warehouse: Data Mart #1
smaller data warehouse
End User Data Analysis
slide 3 of lecture 2 Ad hoc query tools Report Writers End user applications Models: forcasting, optimizing, data mining, etc
Data Warehouse Lifecycle
Project Planning Requirements Analysis Logical Design Physical Design Data Staging Implementation Maintenance Data Analysis
Logical Design
ER Modeling-Dimensional Modeling
Design appropriate table structure and primary key/foreign key relationship
Physical design
Database selection Storage Selection Web based or not performance index plan aggregation plan
Data Staging
ETL Extraction Transformation Load Tool(Pentaho)
Data Analysis
Reporting
Ad hoc query
Graphical Analysis
Query types
drillup and drill down query
Planning
Valuable IT projects should be
aligned with organizaiton’s business strategy and organizational structure
driven by business needs
Planning Step 2 Feasibility Analysis
is it possible
Technical Feasibility
less familiarity creates more risk as well as size and compatibility can be an issue
Economic Feasibility
Development costs, annual oper cost, tangible benefits, intangible benefits
Organizational Feasibility
is it aligned with our business?
critical success factors for DW projects
Requirement Analysis
Joint application development
Questionaire, Interview
joint application development
allow sthe project team, users, and management to work together to identify requirements for the DW project
often the most useful method for collecting requirements from users
Questionnaire
written questions to gather data
Document Analysis
provides clues aobut existing databases typical documents forms reports policy manuals models of current databases
Dimensional Modeling(Star Schema)
a DW logical design technique that seeks to present data in a freamework that is intuitive for data access and allows for high performance data access
intuitive: easy to write SQL
High performance: high performance SQL
Analytical Report
Query where we list sales in jan by customer state and product category
Parts of a dimensional model
fact table - ex: sales
dimension table: ex: something we want to analyze the fact by
fact table
attributes in fact tables are measurements for analysis or contents in reports
dimension table
attributes in dimension tables are constraints for the measurements or headers in reports
Fact Attributes
purpose: measurements for analysis
reporting use: report content
data type: most facts are numerica and additive. there are semi additive or no additive facts
size: larger number of records
Dimension Attributes
Purpose: constraints for the measurements
reporting use: row or column headers
data type: textual descriptive
size: smaller number of records
How do you identify facts and dimensions
requirements analysis
ER Model
F1 Calculation
items are calculated in a data warehouse
F:
refers to special considerations for fact table or special types of fact table
Database, should we have calculated items in it?
you cannot have calculated items in the database
D1: Slowly changing dimension
values of attributes in dimension tables may evole over time. for ex: cust move
Option 1: Overwrite-you lose historical data
Option 2: Add a new attribute to record current value of the changing attribute- you don’t have in between data
Option 3: Add a record whenever a dimension attribute changes
Option 4: warehouse key + method 3
warehouse key is a sequence of non-negative integers served as primary keys of tables in data warehouse
D:
refers to special considerations for dimension table or special types of dimension table
Adding a key in data warehouse
helps us allocate data correctly, it allows us to account for change
Notation for primary key
a dark rectangle
compact primary keys
there can be a primary key that includes a number of items in the fact table
D2: Time Dimension
part of dimension tables, a data warehouse needs more explicit dimensions to differentiate between what we are analyzing
Primary purpose of data warehouse
to query items
D3: Snowflake
normally should be avoided in data warehouse
advantage of avoiding:improve query performance
disadvantage: require more storage space
look at database review slides on his page
do it