Recent Developments in Data Warehousing - Watson Flashcards
Recent Developments in Data Warehousing - Watson
INTRODUCTION and aim of the paper?
- data warehouses are created to provide dedicated source of data to support decision-making applications data repositories play important role in understanding customer behaviour, connecting trading partners along supply chain, supporting performance management systems etc
- aim of paper: tutorial on data warehousing that introduces newest concepts in field
Recent Developments in Data Warehousing - Watson
What is a Data warehouse?
collection of data created to support decision making; provides data infrastructure & thus eliminates failures that are due to lack of quality ‘single version of truth’
Recent Developments in Data Warehousing - Watson
What are 4 DATA WAREHOUSE CHARACTERISTICS ?
1) Subject oriented: organised around specific subjects (sales, customer, products)
2) Integrated: data collected from multiple systems (internal/external) and integrated around subjected (eg in the end data around one customer identifier)
3) Time variant: maintains historical data
4) Non-volatile: users cannot change or update data; updates happen through IT controlled load processes
Recent Developments in Data Warehousing - Watson
What is Data warehousing?
Data warehousing: process that encompasses range of activities in data warehouse
extract data from source system -> transform the data -> load data to data stores -> User and applications access the data
Recent Developments in Data Warehousing - Watson
What is a Data mart?
Data mart: similar to data warehouse but stores data only for limited number of subject areas (smaller in scope than data warehouses)
Recent Developments in Data Warehousing - Watson
What is a Independent data mart?
built directly from source systems; often ‘point solution’ that solves immediate problem but creates new ones in long run when firm tries to implement enterprise-wide data infrastructure (eg data warehouse)
Recent Developments in Data Warehousing - Watson
What is a Dependent data mart?
created with data drawn from data warehouse; provides copy of data extracted from data warehouse to give users customised view of specific data; are preferred over independent ones as they come from organisation-wide source - ‘single version of truth’ is maintained
Recent Developments in Data Warehousing - Watson
What is an Operational data store (ODS): ?
consolidates data from multiple source systems & provides near real-time, integrated view of volatile, current (max 60 day old) data; aim to provide integrated data for organisational purposes; often used to avoid ERP implementation
Recent Developments in Data Warehousing - Watson
What is are Oper marts?
created when current operational data needs to be analysed multi-dimensionally; data is only subset of data in ODS; when analysis is completed, oper mart is deleted
Recent Developments in Data Warehousing - Watson
What is Architecture for data warehousing: ?
includes the component parts and the relationships among the parts Data sources, ETL software, data stores, data analysis tools & applications, users
-> * data are extracted from the various sources and then fed into the staging arear where it is transformed (cleaned) then loaded into data warehouse where it is stored and later accessed by various users with tools and applications
Recent Developments in Data Warehousing - Watson
What is the Enterprise data warehouse approach ?
Enterprise data warehouse approach: by Bill Inmon; top-down approach of developing
Recent Developments in Data Warehousing - Watson
What is the Data mart strategy: ?
by Ralph Kimball; bottom-up approach of developing
both approaches provide benefits but involve limitations
Recent Developments in Data Warehousing - Watson
What is the Data mart strategy: ?
‘start small, think big approach’ begins with specific business need for data and grows after time
* initial data mart contains data for only limited number of subject areas, draws from limited sources
* recognising that data mart will grow: needs to be planned and exercised beforehand as mart will grow if it is successful if not: difficult to integrate data logically & physically
* advantages: due to limited scope initially developed quickly, at low cost, with less financial risk
* disadvantages: difficult to successfully grow a data mart while integrating new subject areas, data, users, applications along the way
Recent Developments in Data Warehousing - Watson
What is the Enterprise data warehouse approach ?
Data warehouse approach: traditional approach t warehousing; does not preclude creation of data marts but mart are created after warehouse is built & pull data from warehouse, not from source systems (dependent data)
* advantages: faster system response time as marts + simpler data view as marts are dependent; results in integrated data warehouse containing many subject areas for multiple users
* disadvantages: risk that it is never completed or fails to meet end users needs
Recent Developments in Data Warehousing - Watson
EXTRACTION, TRANSFORMATION & LOADING PROCESSES (ETL) meaning?
ETL: data extraction, transformation, loading takes fata from source systems, prepares it for decision-support purposes & places it in target data base; ‘plumbing’ – dirty, complex, time-consuming, expensive
Recent Developments in Data Warehousing - Watson
DATA SOURCES: applications?
data in applications are difficult to access for decision-support decision as accessing slows the processing of transactions
Recent Developments in Data Warehousing - Watson
DATA SOURCES: ERP systems?
ERP systems: important source of support data but software often stores data in complex structure which makes extracting difficult use of ERP vendor solutions
Recent Developments in Data Warehousing - Watson
DATA SOURCES: clickstream data?
gathered from customers’ visits to firm website (IP address, download time, user agent, cookie data etc); very voluminous – must be filtered and cleaned up; used to support operational processes
Recent Developments in Data Warehousing - Watson
DATA SOURCES: external data?
provided by third party organisations
Recent Developments in Data Warehousing - Watson
DATA EXTRACTION: Custom-write data extraction programs?
firm writes own ETL software when: wanting to avoid costs of purchasing, knows source systems well, understand complexities not the trend
Recent Developments in Data Warehousing - Watson
DATA EXTRACTION: Purchasing commercial ETL software?
Purchasing commercial ETL software: available from major database vendors & firms specialising in ETL software; allows firms to specify source systems, indicate tables and columns to be used, move data to specified targets & automate process