Lecture 2 Flashcards
Cardinality
The way tables are related. 3 types:
- One-to-one (or zero-to-one)
- One-to-many
- Many-to-many (M-to-N) often via a Bridge table.
Bridge table
Decomposes the M-to-N table in to one-to-many relationships.
CIF and Programming languages composed of
- Production (also called back-end)
- Assembly, logistics, and storage (includes Data-warehousing, is a DBMS)
- Processing, analysis and consumption (also called front-end)
SQL basic form
Select [fields] from [one or more tables] Where [criteria] Group by [product] Having [criteria]
Join
Generally composed of an equality comparison between foreign key and primary key of related tables.
Different ways of creating joins (Venn Diagrams)
- Inner join (result is overlap)
- Left outer join (result is left)
- Right outer join (result is right)
- Full outer join (result is everything)
- Minus (result is left or right without overlap)
Trends in the database world
- From disk-based to in-memory databases (SAP HANA).
History: CPU is relatively fast, memory is very slow. -> the further away from the CPU, the slower.
Current/future: memory became cheaper, and larger. Therefor it is possible to store databases in-memory. However there is a technical movement -> instead of working in rows, they use columns, which makes it faster. - From local databases to cloud databases (Azure, Google Cloud etc.)
Data warehouse
A database that is maintained separately from the organization’s operational databases for the purpose of managerial decision-making.
Data warehousing
The process of constructing and using data warehouses.
Why do we need a separate data warehouse?
- Data quality: If you want to make use of the data, you want to have high data quality.
- Historical data: The problem is time-related. Mostly, databases in the back-end do not have much historical data.
- Data consolidation: You want to aggregate the data.
Components of a data warehouse framework
- Production
- ETL
- Data warehouse
- Data marts
- Meta data
- BI Applications
Production
Contains the different databases.
ETL
Extract, Transform, Load. Within the component, you want to clean, convert, and link the data to improve the data quality.
Data warehouse:
The data in this part is more ‘formed’ to support the information requirements for managers and decision-making. This data is cleaned, ordered and linked. -> Star and Snowflake modelling. This part does not have up-to-date data. There is a certain delay because of the ETL process.
Data Marts
A subset of a data warehouse. There are less details, history and dimensions. It is user/group oriented.