Data Warehouse Flashcards
What is a data warehouse?
It is a database designed for analytical needs.
Can be SQL, NoSQL.
It is consolidated data from multiple locations.
What is the very basic difference between a data warehouse and an operational database?
Data warehouse
- contains historical information which helps in analyzing business metrics
- data warehouse is used to mainly read data
- end users are business analysts/data analysts
Operational Database
- Contains current information that is required to run the business
- Database is mainly used to write the data
- End users are ops team members
What is data warehousing?
- Data warehousing is the act of organizing & storing data in a way so as to make its retrieval efficient and insightful.
- it is also called as the process of transforming data into information.
What is OLAP?
- OLAP (Online Analytical Processing) is a flexible way to make complicated analysis of multidimensional data.
- Data present in a data warehouse is accessed by running OLAP queries. DBs however, are queried by running OLTP (Online transaction processing) operations.
- OLAP activities are performed by converting the multi-dimensional data in a warehouse into an OLAP cube.
- OLAP data are de normalized.
What is an OLAP cube?
A multidimensional array of data with any more than 3 dimensions.
What is OLTP?
- OLTP stands for online transaction processing.
- OLTP queries are used to perform DML operations within a database. e.g. INSERT, UPDATE, DELETE
- OLTP uses data stored in the form of two dimensional tables by rows and columns
How is OLAP different from OLTP?
OLAP is used to perform real-time analysis on a database where the data is de-normalized. OLAP data is archived and historical. Queries for OLAP are slow and complex, involving joins and aggregations. Used for READ or SELECT operations.
OLTP is designed to be fast query processing, maintained in multi-access environments and effectiveness is designed at number of transactions per second. Used specifically for INSERT, DELETE, and UPDATE operations.
What is a dimension table?
- a dimension table is a structure that categorizes facts and measures in order to answer a business question
- dividing a data warehouse project into dimensions provide structured information for analysis and reporting.
What is a fact table?
- a table which contains a measure of the dimensions in a dimension table.
- the fact table contains the measures of the dimension table and the dimension table contains the facts derived from the fact table using aggregations, etc.
What is the level of granularity within a fact table?
- the depth of the data is known as data granularity
- a fact table is designed to be a low level of granularity
- for example, date dimension can be day, month, year within a fact table.
What is the difference between additive, semi-additive, and non-additive facts?
- an additive fact is a measure in a fact table that can be fully summed across any of the dimensions associated with it. An example of additive fact are sales purchased from a store. You can add date information to get sales / (hour or week or month). Likewise, you can get sales data per region.
- a semi-additive fact is a measure in a fact table that can be summed across some dimensions associated with it, but not all. For example, a checking account. It doesn’t make sense to add the balance amounts from October, November, and December.
- a non-additive fact is measure in a fact table that cannot be summed across any of the dimensions associated with it.
What is a factless fact table
- factless fact tables are the fact tables which do not contain numeric fact column in the fact table.
- essentially a factless table does not have any measures. it is an intersection of dimensions which contains only keys. In other words, a many-to-many table.
What are conformed dimensions?
- a conformed dimension is a single dimension composed of multiple dimensions which can be shared across multiple data marts. For example, the time, product, and staff dimensions are shared by the sales and inventory data marts.
-
What are conformed facts?
- a conformed fact is a shared fact which is used in more than one fact table and can be shared among multiple data marts.
What are aggregate tables?
- Aggregate tables are tables which contain the existing warehouse data, grouped to certain levels of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.
- this table reduces the load in the database server and increases the performance of the query.
What is summary information?
- an area in the data warehouse where predefined aggregations are kept.