Lecture 2 Flashcards
Cardinality
The way tables are related. 3 types:
- One-to-one (or zero-to-one)
- One-to-many
- Many-to-many (M-to-N) often via a Bridge table.
Bridge table
Decomposes the M-to-N table in to one-to-many relationships.
CIF and Programming languages composed of
- Production (also called back-end)
- Assembly, logistics, and storage (includes Data-warehousing, is a DBMS)
- Processing, analysis and consumption (also called front-end)
SQL basic form
Select [fields] from [one or more tables] Where [criteria] Group by [product] Having [criteria]
Join
Generally composed of an equality comparison between foreign key and primary key of related tables.
Different ways of creating joins (Venn Diagrams)
- Inner join (result is overlap)
- Left outer join (result is left)
- Right outer join (result is right)
- Full outer join (result is everything)
- Minus (result is left or right without overlap)
Trends in the database world
- From disk-based to in-memory databases (SAP HANA).
History: CPU is relatively fast, memory is very slow. -> the further away from the CPU, the slower.
Current/future: memory became cheaper, and larger. Therefor it is possible to store databases in-memory. However there is a technical movement -> instead of working in rows, they use columns, which makes it faster. - From local databases to cloud databases (Azure, Google Cloud etc.)
Data warehouse
A database that is maintained separately from the organization’s operational databases for the purpose of managerial decision-making.
Data warehousing
The process of constructing and using data warehouses.
Why do we need a separate data warehouse?
- Data quality: If you want to make use of the data, you want to have high data quality.
- Historical data: The problem is time-related. Mostly, databases in the back-end do not have much historical data.
- Data consolidation: You want to aggregate the data.
Components of a data warehouse framework
- Production
- ETL
- Data warehouse
- Data marts
- Meta data
- BI Applications
Production
Contains the different databases.
ETL
Extract, Transform, Load. Within the component, you want to clean, convert, and link the data to improve the data quality.
Data warehouse:
The data in this part is more ‘formed’ to support the information requirements for managers and decision-making. This data is cleaned, ordered and linked. -> Star and Snowflake modelling. This part does not have up-to-date data. There is a certain delay because of the ETL process.
Data Marts
A subset of a data warehouse. There are less details, history and dimensions. It is user/group oriented.
Dependent Data Mart
Dependent on the central warehouse.
Independent Data Mart
Independent from the central warehouse and is directly linked to the ETL.
Metadata
Data about the data (such as location, meaning, transformations etc.)
BI applications
Connected with the data warehouse or data mart. Two trends:
- Integrated BI/BA-suits
- Enterprise information portals.
What is a datawarehouse
- Subject oriented
- Integrated
- Time-variant
- Nonvolatile
Collection of data in support of management’s decision-making process.
Subject oriented
- Focussing on the analysis of data or decision makers not on on daily operations/transition processing.
- Providing a simple view around particular subjects by excluding data that are not useful in the decision support.
Integrated
- Constructed by integrating multiple heterogenous data sources.
- Data cleaning and data integration techniques are applied.
+ Ensure consistency in naming conventions.
+ When data is moved to the data warehouse, it is converted.
Time-variant
- Time horizon for the warehouse is significantly longer than that of operational systems.
+ Operational database
+ Data warehouse data: provide information from historical perspective. - Every key structure in the data warehouse.
+ Contains an element of time.
Non-volatile
- Operational updates of data do not occur in the data warehouse environment. This requires 2 operations:
+ Initial loading (SQL: Insert)
+ Access of data (SQL: Select)
Different types of data warehouses
- Direct BI on source systems
- Canned data warehouse (in ERP system)
- Independent data marts
- Bus architecture
- Enterprise data warehouse
- Hub & spoke
- Federated data warehouse
- Data lake
Central data warehouse
A pure architect -> Top down -> Enterprise data warehouse approach.
- Has one integrated warehouse.
- Advantage is that it has a simple ETL process.
- Entity-relational modelling.
- Examples: Hub & Spoke and Federated data warehouse
Federated architecture
Architecture is composed of data marts (no central data warehouse) -> Bottom up -> Canned data warehouse.
- Start at the requirements of a single department, and after that they scale up. Data warehouse = collection of data marts. Typical feature = dimensional modelling. Cheap to build.
- Dimensional modelling: consistency achieved by conformed dimensions.
- Problems:
+ very much ETL has to be done.
+ low data consistency -> solution is data mart bus architecture, which links the different data marts by conformed dimensions.
- Examples: Independent data marts, bus architecture, and the canned data warehouse are within this category.
Data mart vs Enterprise data warehouse
No one-size fits all strategy to data-warehousing. Depends on:
- Management’s information needs -> when management want a central overview, the enterprise data warehouse approach is probably better.
- Information interdependence between organisational units.