Ch 5.4 Data Warehouses and Data Marts Flashcards
If a manaageer wants to know the profit margin on used books, how would they find this?
use SQL or query by example
this allows managers to look up things in the database
What is QBE
making database queries that allow user to search the doc based on an example (string of text )
Why do we need data warehouses and data marts if be have QBE or SQL
this is because if we want to look at trends, we need a complicated system
what do data warehouses and data marts have that sql doesnt
1) info is organized better than sql is
2) these databases are read only!!! no need for updating or processing
3) SQL is used to access one record at a time; Data warehouss and mart is designed to access large groups of related records
Data warehouse
holds historiccal data organized by subject
-> very expensive
data mart
cheaper alternative to data warehouses,
low cost and scaled down
-> implemented in less than 90 days
how are data warehouses and marts organized?
1) organized by business dimension or subject (customer, vendor, product, price etc)
2) use online analytical processing (analyzes data from end users)
3)integrated: data collected from multiple systems and integrated around subjects
4) time variant: have HISTORICAL data
5) non volatile: users cant change or update data
6) multidimensional: store data in more than 2 dimensions “data cube”
what are components of the data warehouse environment:
1) source systems that provide data
2) data integration tech
3) diff architectures for storing data
4) diff tools and applications for users
5) metadata (Data about the data)
Source systems
of data warehouses and data marts
“organizational pain point” motivates businesses to develop BI, this BI needs data to work
PROVIDE DATA TO THE WAREHOUSE OR MART
EX: operational data, erp, website data, 3rd party data,
relational database vs multidinensional data warehouses and data marts
Relational database is like ERD tables: separate tables w PKs and FKs
Data cube: its a matrix of 3 dimensiosns
Data integration- data warehouse environment
USES ETL
tech that processes and prepares data for usage
steps of ETL
EXTRACTION: handwritted code or SQL queries
TRANSFORMATION: data integrated around a common key (student id, and all data around the student classes courses major parking)
- sometimes transformation is changing 0 and 1 to M and F
LOADED: loading data into the mart/warehouse (loading window)
STORING DATA- environment of data warehouses/marts
1) most common: data warehouse
-> one single read-only version of truth
2) independant data mart: store data for a single/few applications (ex: marketing or finance) FUNCTIONAL AREA CENTRIC
-> not very effective, inconsistent data
3) Hub and spoke: a central data warehouse that has the data plus many dependant data martsMe
METADATA- environment of data warehouses/marts
data about the data, things like data quality and the data wrehouse status
INCLUDES UPDATE SCHEDULES
DATA QUALITY- environment of data warehouses/marts
QUALITY of data in the warehouse must be high!
DATA cleansing needs to be used to ensure data meets users needs