Lecture 2 Flashcards
Cardinality
The way tables are related. 3 types:
- One-to-one (or zero-to-one)
- One-to-many
- Many-to-many (M-to-N) often via a Bridge table.
Bridge table
Decomposes the M-to-N table in to one-to-many relationships.
CIF and Programming languages composed of
- Production (also called back-end)
- Assembly, logistics, and storage (includes Data-warehousing, is a DBMS)
- Processing, analysis and consumption (also called front-end)
SQL basic form
Select [fields] from [one or more tables] Where [criteria] Group by [product] Having [criteria]
Join
Generally composed of an equality comparison between foreign key and primary key of related tables.
Different ways of creating joins (Venn Diagrams)
- Inner join (result is overlap)
- Left outer join (result is left)
- Right outer join (result is right)
- Full outer join (result is everything)
- Minus (result is left or right without overlap)
Trends in the database world
- From disk-based to in-memory databases (SAP HANA).
History: CPU is relatively fast, memory is very slow. -> the further away from the CPU, the slower.
Current/future: memory became cheaper, and larger. Therefor it is possible to store databases in-memory. However there is a technical movement -> instead of working in rows, they use columns, which makes it faster. - From local databases to cloud databases (Azure, Google Cloud etc.)
Data warehouse
A database that is maintained separately from the organization’s operational databases for the purpose of managerial decision-making.
Data warehousing
The process of constructing and using data warehouses.
Why do we need a separate data warehouse?
- Data quality: If you want to make use of the data, you want to have high data quality.
- Historical data: The problem is time-related. Mostly, databases in the back-end do not have much historical data.
- Data consolidation: You want to aggregate the data.
Components of a data warehouse framework
- Production
- ETL
- Data warehouse
- Data marts
- Meta data
- BI Applications
Production
Contains the different databases.
ETL
Extract, Transform, Load. Within the component, you want to clean, convert, and link the data to improve the data quality.
Data warehouse:
The data in this part is more ‘formed’ to support the information requirements for managers and decision-making. This data is cleaned, ordered and linked. -> Star and Snowflake modelling. This part does not have up-to-date data. There is a certain delay because of the ETL process.
Data Marts
A subset of a data warehouse. There are less details, history and dimensions. It is user/group oriented.
Dependent Data Mart
Dependent on the central warehouse.
Independent Data Mart
Independent from the central warehouse and is directly linked to the ETL.
Metadata
Data about the data (such as location, meaning, transformations etc.)
BI applications
Connected with the data warehouse or data mart. Two trends:
- Integrated BI/BA-suits
- Enterprise information portals.
What is a datawarehouse
- Subject oriented
- Integrated
- Time-variant
- Nonvolatile
Collection of data in support of management’s decision-making process.
Subject oriented
- Focussing on the analysis of data or decision makers not on on daily operations/transition processing.
- Providing a simple view around particular subjects by excluding data that are not useful in the decision support.
Integrated
- Constructed by integrating multiple heterogenous data sources.
- Data cleaning and data integration techniques are applied.
+ Ensure consistency in naming conventions.
+ When data is moved to the data warehouse, it is converted.
Time-variant
- Time horizon for the warehouse is significantly longer than that of operational systems.
+ Operational database
+ Data warehouse data: provide information from historical perspective. - Every key structure in the data warehouse.
+ Contains an element of time.
Non-volatile
- Operational updates of data do not occur in the data warehouse environment. This requires 2 operations:
+ Initial loading (SQL: Insert)
+ Access of data (SQL: Select)