Data Warehouse Design - Chapter 1.3 Flashcards
What are 5 architecture properties that are essential for a data warehouse system?
- Separation
- Scalability
- Extensibility
- Security
- Administerability
What is meant by seperation?
Analytical and transactional processing should be kept apart as much as possible
What is meant by scalability?
Hardware and software architectures should be easy to upgrade as the data volume and number of users requirements increase
What is meant by extensibility?
It should be able to host new appliations and technologies without redesiging the whole system
What is meant by security?
Monitoring access is essential because of the strategic data stored in data warehouses
What is meant by administerability?
Management of the system should not be overly difficult
What are the characteristics of a single-layer data warehouse?
- There is only one physical layer of data, which is the source layer of operational data
- The data warehouse is virtual, created by middleware.
- It does not meet the requirement ‘seperation’
- Rarely used in real life
When can a single layer data warehouse be successful?
If analysis needs are particularly restricted and the data volume to analyze is huge.
What are the stages in the two-layer architecture?
- Source layer
- Data staging
- Data warehouse layer
- Analysis
–> So it has four stages, but two physical layers, namely the source layer and the data warehouse layer
What is the source layer?
Data that is originally stored to corporate relational databases or legacy databases
What is meant by data staging?
Here data that is stored in the physical layer is extracted, cleansed and integrated into one common scheme.
For this, ETL tools are used.
What are ETL Tools?
Extraction, Transformation and Loading tools - to merge, extract, transform, cleanse, alidate, filter and load source data into a data warehouse.
What is the data warehouse layer?
The place where the information is stored to one logically centralized repository.
This data warehouse can sometimes automatically be accesed or sometimes data marts are used.
What is a data mart?
A subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, deparment or group of users
What happes in the analysis stage?
Integrated data is efficiently and flexibly accessed to issue reports, analyze information and simulate business scenarios
What are three reasons data marts are useful?
- They can be used as building blocks while incrementally developing data warehouses;
- They mark out the information required by a specific group of users to solve queries;
- They can deliver better performance because they are smaller than primary data warehouses
Which two types of data marts are there?
- Dependent data marts - populated from a primary data warehouse
- Independent data marts - populated by the sources
(streamlines the design, but prone to inconsistencies between the different marts)
How can you avoid data inconsistencies between independent data marts?
By creating a primary data warehouse that is populated by the individual marts (instead of the other way around).
What are five advantages of the two-layer architecture?
- Good quality information is always available, even if access to the sources is unavailable.
- Analysis queries do not affect the management of transactions
- Data warehouses are logically structured according to the multi dimensional model
- OLAP system have historical and summarized data as well - so you avoid a mismatch in terms of time and level of detail
- Data warehouses can be designed specifically to optimize performance
What is the key characteristic of the three-layer architecture?
It has all the stages from the two-layer architecture, but in between the data staging and the data warehouse there is another layer called ‘reconciled data’.
-> So the data warehouse is populated by the reconciled data, not by the operational sources
What are the advantages of the added ‘reconciled data’ layer?
- It creates a common reference data model for a whole enterprise.
- Sometimes the reconciled layer can be used to better accomplish operational tasks (instead of using the operational sources)
Which main architectual principles are used for data warehouse systems?
- One layer system
- Two layer system
- Three layer system
Which main five types of system are distinguished in scientific literature that also include the beforementioned layers?
- Independent data marts architecture
- Bus architecture
- Hub-and-Spoke architecture
- Centralized architecture
- Federated architecture
What is the independent data mart architecture?
Different data marts are separately designed and build in a non-integrated fashion.
- > used when there is no desire for an enterprise-wide warehousing project or when divisions are only loosely related
- > Not the best data integration and cross-reporting
What is the bus architecture?
Similar to the independent data mart architecture, but the data marts are all designed according the same conformed dimensions to ensure logical integration and an enterprise-wide view of information.
What is the hub-and-spoke architecture?
The data marts are feeded by the ‘reconciled data’ layer. Users access the data marts, but sometimes also the reconciled layer.
- > Attention to scalability, extensibility & an enterprise-wide view.
- > Most used in medium to large companies.
What is the centralized architecture?
It is an implementation of the hub-and-spoke architecture, in which the reconciled layer and the data marts are collapsed into a single physical repository.
What is the federated architecture?
There are multiple data marts that are supplied by the operational data sources. These data marts are then either physically or virtually integrated with the others iin the logical-phyiscal integration phase.
-> Used in dynamix context where preexisting data warehouses / data marts need to be integrated. (think with a merger)
Which 4 factors are particularly influential when choosing one of these architectures?
- The amount of interdependent information exchanged between organizational units
- Restrictions on (human) capital resources could lead to a ‘quick’ architecture
- Enterprise strategy
- The need for integration of pre-existing data warehouses can lead to a federal system.