Data Warehouse Flashcards
What is a data warehouse?
It is a database designed for analytical needs.
Can be SQL, NoSQL.
It is consolidated data from multiple locations.
What is the very basic difference between a data warehouse and an operational database?
Data warehouse
- contains historical information which helps in analyzing business metrics
- data warehouse is used to mainly read data
- end users are business analysts/data analysts
Operational Database
- Contains current information that is required to run the business
- Database is mainly used to write the data
- End users are ops team members
What is data warehousing?
- Data warehousing is the act of organizing & storing data in a way so as to make its retrieval efficient and insightful.
- it is also called as the process of transforming data into information.
What is OLAP?
- OLAP (Online Analytical Processing) is a flexible way to make complicated analysis of multidimensional data.
- Data present in a data warehouse is accessed by running OLAP queries. DBs however, are queried by running OLTP (Online transaction processing) operations.
- OLAP activities are performed by converting the multi-dimensional data in a warehouse into an OLAP cube.
- OLAP data are de normalized.
What is an OLAP cube?
A multidimensional array of data with any more than 3 dimensions.
What is OLTP?
- OLTP stands for online transaction processing.
- OLTP queries are used to perform DML operations within a database. e.g. INSERT, UPDATE, DELETE
- OLTP uses data stored in the form of two dimensional tables by rows and columns
How is OLAP different from OLTP?
OLAP is used to perform real-time analysis on a database where the data is de-normalized. OLAP data is archived and historical. Queries for OLAP are slow and complex, involving joins and aggregations. Used for READ or SELECT operations.
OLTP is designed to be fast query processing, maintained in multi-access environments and effectiveness is designed at number of transactions per second. Used specifically for INSERT, DELETE, and UPDATE operations.
What is a dimension table?
- a dimension table is a structure that categorizes facts and measures in order to answer a business question
- dividing a data warehouse project into dimensions provide structured information for analysis and reporting.
What is a fact table?
- a table which contains a measure of the dimensions in a dimension table.
- the fact table contains the measures of the dimension table and the dimension table contains the facts derived from the fact table using aggregations, etc.
What is the level of granularity within a fact table?
- the depth of the data is known as data granularity
- a fact table is designed to be a low level of granularity
- for example, date dimension can be day, month, year within a fact table.
What is the difference between additive, semi-additive, and non-additive facts?
- an additive fact is a measure in a fact table that can be fully summed across any of the dimensions associated with it. An example of additive fact are sales purchased from a store. You can add date information to get sales / (hour or week or month). Likewise, you can get sales data per region.
- a semi-additive fact is a measure in a fact table that can be summed across some dimensions associated with it, but not all. For example, a checking account. It doesn’t make sense to add the balance amounts from October, November, and December.
- a non-additive fact is measure in a fact table that cannot be summed across any of the dimensions associated with it.
What is a factless fact table
- factless fact tables are the fact tables which do not contain numeric fact column in the fact table.
- essentially a factless table does not have any measures. it is an intersection of dimensions which contains only keys. In other words, a many-to-many table.
What are conformed dimensions?
- a conformed dimension is a single dimension composed of multiple dimensions which can be shared across multiple data marts. For example, the time, product, and staff dimensions are shared by the sales and inventory data marts.
-
What are conformed facts?
- a conformed fact is a shared fact which is used in more than one fact table and can be shared among multiple data marts.
What are aggregate tables?
- Aggregate tables are tables which contain the existing warehouse data, grouped to certain levels of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.
- this table reduces the load in the database server and increases the performance of the query.
What is summary information?
- an area in the data warehouse where predefined aggregations are kept.
What is ETL?
- ETL stands for extract, transform, and load.
- It is the process of using a software to extract the desired data from various sources, then transform that data by using rules and lookup tables to meet your requirements, then loading it into a target data warehouse.
What are the tools available for ETL?
Informatica Powercenter, SQL server integration services (SSIS), SAP data services, data migrator (IBI), IBM infosphere information server, SAS data management, data junction, Oracle warehouse builder.
What is a data mart?
- a smaller version of the data warehouse which deals with a single subject.
- data marts are focused on one area. Hence, they draw data from a limited number of sources.
- time taken to build data marts is very less compared to the time taken to build a data warehouse.
- for example, a data warehouse can be subdivided into sales, marketing, and operations data within the data mart.
What is metadata?
- defined as data about data.
- Metadata in a data warehouse defines the source data i.e. flat file, relational database, and other objects.
- metadata is used to define which table is source and target, and which concept is used to build business logic called transformation to the actual output.
What is data mining?
Data mining is the process of analyzing data in different dimensions & summarizing it into useful info. Data is searched, retrieved and analyzed from a data warehouse to answer business questions.
What’s the difference between a data warehouse and data mining?
Data warehouse is about storing analytical data in a structure suitable for data mining. This analytical data is extracted from operational systems usually on a daily basis.
List the types of OLAP servers
Multi-dimensional OLAP: MOLAP is a form of OLAP that processes and stores the data directly into a multidimensional database. Benefit is, it can perform complex calculations b ut, only limited data can be handled.
- Relational OLAP: ROLAP is a form of OLAP that performs analysis of multidimensional data stored in a relational database rather than a multidimensional database. Greater amount of data can be processed but it requires more processing time/disk space.
- Hybrid OLAP: HOLAP is a combination of advantages of MOLAP and ROLAP. HOLAP can drill through from the cube into underlying relational data.
Which one is faster, multidimensional or relational OLAP?
Multidimensional OLAP is faster than relational OLAP.