Data Warehouse Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is a data warehouse?

A

It is a database designed for analytical needs.

Can be SQL, NoSQL.

It is consolidated data from multiple locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the very basic difference between a data warehouse and an operational database?

A

Data warehouse
- contains historical information which helps in analyzing business metrics

  • data warehouse is used to mainly read data
  • end users are business analysts/data analysts

Operational Database
- Contains current information that is required to run the business

  • Database is mainly used to write the data
  • End users are ops team members
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data warehousing?

A
  • Data warehousing is the act of organizing & storing data in a way so as to make its retrieval efficient and insightful.
  • it is also called as the process of transforming data into information.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is OLAP?

A
  • OLAP (Online Analytical Processing) is a flexible way to make complicated analysis of multidimensional data.
  • Data present in a data warehouse is accessed by running OLAP queries. DBs however, are queried by running OLTP (Online transaction processing) operations.
  • OLAP activities are performed by converting the multi-dimensional data in a warehouse into an OLAP cube.
  • OLAP data are de normalized.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an OLAP cube?

A

A multidimensional array of data with any more than 3 dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is OLTP?

A
  • OLTP stands for online transaction processing.
  • OLTP queries are used to perform DML operations within a database. e.g. INSERT, UPDATE, DELETE
  • OLTP uses data stored in the form of two dimensional tables by rows and columns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is OLAP different from OLTP?

A

OLAP is used to perform real-time analysis on a database where the data is de-normalized. OLAP data is archived and historical. Queries for OLAP are slow and complex, involving joins and aggregations. Used for READ or SELECT operations.

OLTP is designed to be fast query processing, maintained in multi-access environments and effectiveness is designed at number of transactions per second. Used specifically for INSERT, DELETE, and UPDATE operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a dimension table?

A
  • a dimension table is a structure that categorizes facts and measures in order to answer a business question
  • dividing a data warehouse project into dimensions provide structured information for analysis and reporting.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a fact table?

A
  • a table which contains a measure of the dimensions in a dimension table.
  • the fact table contains the measures of the dimension table and the dimension table contains the facts derived from the fact table using aggregations, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the level of granularity within a fact table?

A
  • the depth of the data is known as data granularity
  • a fact table is designed to be a low level of granularity
  • for example, date dimension can be day, month, year within a fact table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between additive, semi-additive, and non-additive facts?

A
  • an additive fact is a measure in a fact table that can be fully summed across any of the dimensions associated with it. An example of additive fact are sales purchased from a store. You can add date information to get sales / (hour or week or month). Likewise, you can get sales data per region.
  • a semi-additive fact is a measure in a fact table that can be summed across some dimensions associated with it, but not all. For example, a checking account. It doesn’t make sense to add the balance amounts from October, November, and December.
  • a non-additive fact is measure in a fact table that cannot be summed across any of the dimensions associated with it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a factless fact table

A
  • factless fact tables are the fact tables which do not contain numeric fact column in the fact table.
  • essentially a factless table does not have any measures. it is an intersection of dimensions which contains only keys. In other words, a many-to-many table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are conformed dimensions?

A
  • a conformed dimension is a single dimension composed of multiple dimensions which can be shared across multiple data marts. For example, the time, product, and staff dimensions are shared by the sales and inventory data marts.

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are conformed facts?

A
  • a conformed fact is a shared fact which is used in more than one fact table and can be shared among multiple data marts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are aggregate tables?

A
  • Aggregate tables are tables which contain the existing warehouse data, grouped to certain levels of dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.
  • this table reduces the load in the database server and increases the performance of the query.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is summary information?

A
  • an area in the data warehouse where predefined aggregations are kept.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is ETL?

A
  • ETL stands for extract, transform, and load.
  • It is the process of using a software to extract the desired data from various sources, then transform that data by using rules and lookup tables to meet your requirements, then loading it into a target data warehouse.
18
Q

What are the tools available for ETL?

A

Informatica Powercenter, SQL server integration services (SSIS), SAP data services, data migrator (IBI), IBM infosphere information server, SAS data management, data junction, Oracle warehouse builder.

19
Q

What is a data mart?

A
  • a smaller version of the data warehouse which deals with a single subject.
  • data marts are focused on one area. Hence, they draw data from a limited number of sources.
  • time taken to build data marts is very less compared to the time taken to build a data warehouse.
  • for example, a data warehouse can be subdivided into sales, marketing, and operations data within the data mart.
20
Q

What is metadata?

A
  • defined as data about data.
  • Metadata in a data warehouse defines the source data i.e. flat file, relational database, and other objects.
  • metadata is used to define which table is source and target, and which concept is used to build business logic called transformation to the actual output.
21
Q

What is data mining?

A

Data mining is the process of analyzing data in different dimensions & summarizing it into useful info. Data is searched, retrieved and analyzed from a data warehouse to answer business questions.

22
Q

What’s the difference between a data warehouse and data mining?

A

Data warehouse is about storing analytical data in a structure suitable for data mining. This analytical data is extracted from operational systems usually on a daily basis.

23
Q

List the types of OLAP servers

A

Multi-dimensional OLAP: MOLAP is a form of OLAP that processes and stores the data directly into a multidimensional database. Benefit is, it can perform complex calculations b ut, only limited data can be handled.

  • Relational OLAP: ROLAP is a form of OLAP that performs analysis of multidimensional data stored in a relational database rather than a multidimensional database. Greater amount of data can be processed but it requires more processing time/disk space.
  • Hybrid OLAP: HOLAP is a combination of advantages of MOLAP and ROLAP. HOLAP can drill through from the cube into underlying relational data.
24
Q

Which one is faster, multidimensional or relational OLAP?

A

Multidimensional OLAP is faster than relational OLAP.

25
Q

What are the operations that can be performed by an OLAP cube?

A
  • Roll-up: performs aggregation on a data cube by either
    1. climbing up a concept hierarchy for a dimension
    2. dimension reduction

roll up is an aggregation on top of an aggregation. For example, you can aggregate products sales by product which is an aggregate function. A roll up aggregate function can summarize total sales of the products.

  • Pivot: a rotation operation which transposes the axes in order to provide an alternative presentation of data.
26
Q

How many dimensions are selected in a slice operation?

A

Only one dimension is selected for the slice operation.

27
Q

How many dimensions are selected in a dice operation?

A

two or more dimensions are selected for a given cube.

28
Q

What is normalization?

A
  • Normalization is the process of splitting up the data into multiple tables.
  • different normalized forms of the data are 1NF, 2NF, and 3NF. The process converting data into their normal forms is called normalizing.
29
Q

What’s the benefit of normalization?

A

it helps in reducing data redundancy.

30
Q

What is an ER diagram?

A

Entity-relationship diagram. Illustrates the interrelationships between the various entities in a database.

31
Q

is an ER diagram implemented in a data warehouse?

A

No, ER diagram is not implemented in a data warehouse. We make use of dimensional modeling in a data warehouse.

32
Q

What is called as dimensional modeling?

A
  • dimensional modelling is a concept which can be used by data warehouse deigners to build thier own data warehouse. This model can be stored in two types of tables - Facts and dimension table. Fact tables has the facts and measurements of the business and dimension tables contain the context of measurements.
  • Types of dimensional modelling are:

1) conceptual modeling
2) logical modeling
3) physical modeling

33
Q

What is SCD?

A
  • slowly changing dimensions is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
34
Q

What are the three types of SCDs?

A
  • 1) Type 1, Overwriting; Type 1 SCD the new data overwrites the existing data. The existing data is lost as it is not stored anywhere else.
  • 2) Type 2 SCD - Creating another dimension record; retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active.
  • 3) Type 3 SCD - Creating a current value field; Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous value and the current value of the selected attribute. When the value of any of the selected attribute changes, the current value is stored as the old value and the new value becomes the current value.
35
Q

What’s type 1 SCD?

A
  • 1) Type 1, Overwriting; Type 1 SCD the new data overwrites the existing data. The existing data is lost as it is not stored anywhere else.
36
Q

What’s type 2 SCD?

A

Type 2 SCD - Creating another dimension record; retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record. Each record contains the effective time and expiration time to identify the time period between which the record was active.

37
Q

What’s type 3 SCD?

A

Type 3 SCD - Creating a current value field; Type 3 SCD stores two versions of values for certain selected level attributes. Each record stores the previous value and the current value of the selected attribute. When the value of any of the selected attribute changes, the current value is stored as the old value and the new value becomes the current value.

38
Q

What are the key columns in a dimension tables?

A
  • The key column in dimension tables are the primary keys of entity tables.
39
Q

What are the key columns in a fact table?

A
  • the key column of a fact table are the foreign keys of entity tables.
40
Q

Which schemas do data warehouses implement?

A
  • each dimension in a star schema is represented with a one-dimensional table which contains a set of attributes