data warehouse Flashcards

1
Q

What are the three levels of decision making in a company and how do they impact the underlying data infrastructure?

A

Operational Level
Tactical Level
Strategic Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Operational Level:

A

Primary Focus: Handle simple data operations, such as INSERT, UPDATE, DELETE, and SELECT statements.
Key Performance Indicators (KPIs): Transaction throughput and response time.
Role in Decision Making:
Data Processing: Perform real-time processing and support day-to-day business operations.
Automation: Facilitate the automation of routine tasks and ensure that transactions are processed efficiently.
Examples: Point of Sale (POS) systems, Online Transaction Processing (OLTP) systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tactical Level:

A

Primary Focus: Provide information to support mid-term decision-making.
Key Features:
Complex Queries: Support complex ad-hoc SELECT statements for in-depth analysis.
Data Summarization: Offer various levels of data aggregation to present summarized information.
Multidimensional Data Analysis: Allow users to explore data from different perspectives (e.g., OLAP systems).
Role in Decision Making:
Trend Analysis: Detect and analyze trends and patterns within the data.
Scenario Analysis: Evaluate different scenarios and their potential outcomes to make informed decisions.
Examples: Management Information Systems (MIS), Business Intelligence (BI) tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Strategic Level:

A

Primary Focus: Aid in long-term decision-making and strategic planning.
Key Features:
Data Integration: Integrate data from various sources, providing a comprehensive view of the business.
Data Warehousing: Utilize data warehouses to store and manage large volumes of historical data.
Advanced Data Analysis: Support complex analyses, such as predictive modeling, forecasting, and time-series analysis.
Role in Decision Making:
Strategic Planning: Assist in formulating and evaluating strategic initiatives.
Long-term Trends: Analyze long-term trends and patterns to guide future business strategies.
Examples: Executive Information Systems (EIS), Enterprise Resource Planning (ERP) systems, Data Warehouses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

characteristics of
Operational and strategic levels

A

Operational and strategic levels have different decision-making horizons and require distinct data infrastructure support.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a data warehouse and how does it differ from traditional databases?

A

A data warehouse is a centralized repository of integrated data from multiple sources, designed to support business analysis and decision-making.

Key differences from traditional databases:

Purpose: While traditional databases are optimized for transactional processing, data warehouses are focused on analytical reporting and decision support.
Schema: Data warehouses typically have a star or snowflake schema, optimized for querying large datasets, whereas traditional databases often have a normalized schema for efficient updates.
Data: Data warehouses often contain historical data, while traditional databases focus on current data.
Access: Data warehouses are typically accessed by analysts and decision-makers, while traditional databases are used by operational systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

characteristics of data warehouse

A

Subject-Oriented:

Integrated:

non volatile
time variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

subject oriented data warehouse

A

Subject-Oriented:

Focuses on a specific business subject (e.g., customers, products, sales).
Data is organized around these subjects, making it easier to analyze and understand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

integerated in data warehouse

A

Integrated:

Combines data from various sources (e.g., different departments, systems).
Ensures a consistent view of the data across the organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

non valatie in data warehouse

A

Data is primarily read-only.
This means that once data is loaded into the warehouse, it’s typically not updated in place.
data loading and data retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

time variant in Datawarehouse

A

Captures data over time as a series of periodic snapshots.
time series of periodic snapshots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the primary role of decision support systems

A

Decision Support Systems (DSS) since their primary aim is to provide information to support decisions either in the mid or long term. Those DSSs require other types of data manipulation than traditional operational systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To successfully merge and consolidate all this data, the data warehouse needs to ensure that all data is named, transformed and represented in a similar way.

A

data integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

time variant in operational data

A

Operational data is always up-to-date and represents the most recent state of the data elements, whereas a data warehouse is not necessarily up-to-date but represents the state at some specific moments in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Day-to-Day Operations: Designed to handle day-to-day business activities and operations.

A

transactional system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Real-Time Processing: Works with real-time data to support immediate business needs.

A

transactional system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Application-Oriented: The design is oriented around specific applications or business processes.

A

transactional system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Decision Support: Focuses on supporting tactical and strategic decision-making processes.

A

Datawarehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Historical Data Analysis: Utilizes periodic snapshots, including historical data, to provide a broader context for analysis.

A

Datawarehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Subject-Oriented: The design is centered around key subjects or areas of interest (e.g., sales, finance, customer data).

A

data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data Latency in transactional systems and Datawarehouse

A

Transactional System:
Real-Time Data: Operates with real-time data, ensuring that the most current information is available for immediate operations.
Data Warehouse:
Periodic Snapshots: Works with data that is updated periodically, reflecting the state of the business at specific intervals over time.
Historical Data: Includes historical snapshots to support trend analysis and long-term planning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the primary methods for representing conceptual data models for transactional databases?

A

ER (Entity-Relationship) and UML (Unified Modeling Language) are the two most common methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why are data warehouses typically designed at the logical level?

A

The main goal of data warehousing is to provide a foundation for decision-making., implementation

24
Q

What is a star schema in data warehousing?

A

A star schema has a central fact table connected to multiple dimension tables, storing measurement data and reference attributes for analysis.

25
Q

When would you use a snowflake schema instead of a star schema?

A

A snowflake schema is used when dimension tables are large, requiring more efficient storage, or when normalized data is preferred.

26
Q

What is a fact constellation schema?

A

A fact constellation schema, also known as a galaxy schema, includes multiple fact tables sharing dimension tables.

27
Q

Why are surrogate keys used in data warehouses instead of business keys?

A

Surrogate keys are used for consistency, better performance, and to shield the data warehouse from changes in the operational environment.

28
Q

What does the granularity of a fact table refer to?

A

Granularity refers to the level of detail in the fact table. Higher granularity means more detailed data; lower granularity means more aggregated data.

29
Q

What is a factless fact table?

A

A factless fact table contains only foreign keys and no measurement data, used primarily to track events or relationships.

30
Q

What is the ETL process in data warehousing?

A

he ETL process stands for Extract, Transform, Load. It involves:

Extracting data from various operational sources (which can be outdated or poorly documented).
Transforming the data to fit the data warehouse schema by formatting, cleansing, merging, and enriching it.
Loading the data into the data warehouse, filling the fact and dimension tables.

31
Q

what is extracting ?

A

Extracting data from various operational sources (which can be outdated or poorly documented).

32
Q

what is transforming ?

A

Transforming the data to fit the data warehouse schema by formatting, cleansing, merging, and enriching it.

33
Q

what is loading ?

A

Loading the data into the data warehouse, filling the fact and dimension tables.

34
Q

what is cdc in etl

A

CDC stands for Changed Data Capture. It’s a technique used in data warehousing and ETL (Extract, Transform, Load) processes to efficiently extract only the data that has changed since the last extraction. This approach can significantly improve performance and reduce the amount of data transferred.

35
Q

What is a data mart and why is it used?

A

A data mart is a smaller, focused version of a data warehouse designed to meet the needs of a specific group of users, like a department (e.g., marketing, finance).

36
Q

dependant data mart

A

pull data directly from central data warehouse

37
Q

independent data mart

A

standalone systems drawing data directly from
operational systems, external sources or combination

38
Q

why using virtualization

A

A disadvantage of a physical data warehouse or data mart is that both consume physical storage and must be updated periodically. Hence, they never contain the most recent version of the data.

39
Q

Describe an Operational Data Store (ODS).

A

Provides query facilities.
Is suitable for analysis tools requiring near real-time data.
Is used for less complex analyses, while more complex ones are performed on the data warehouse.

40
Q

Business intelligence (BI)

A

set of activities, techniques and tools aimed at
understanding patterns in past data and predicting future

41
Q

Types of BI

A

verification-based BI
discovery-oriented BI

42
Q

Query and reporting

A

business user can graphically and interactively design
query and corresponding report
self service BI

43
Q

Query by Example (QBE)

A

a query is composed in a user-friendly and visual way
– report can be refreshed at any time

44
Q

olap

A

OLAP allows you interactively analyze data, summarize
it and visualize it in various ways
* Provide business-user with tool for ad-hoc
querying

45
Q

molap

A

Multidimensional OLAP (MOLAP)
– stores multidimensional data using a Multidimensional DBMS (MDBMS)
whereby data is stored in multi-dimensional array-based data structure
optimized for efficient storage and quick access

46
Q

molap challenges

A

Challenges include high storage needs, poor scalability with increasing dimensions, proprietary data structures, inefficient updates, and lack of portability.

47
Q

rolap vs molap

A

MOLAP excels in speed and efficiency for complex queries with precomputed data, but it is less scalable and more rigid.
ROLAP offers better scalability, flexibility, and ease of integration with existing relational databases but may have slower query performance due to on-the-fly computations.

48
Q

rolap

A

Relational OLAP (ROLAP)
– stores data in relational data warehouse, which can be
implemented using star, snowflake or fact constellation
schema

49
Q

holap

A

HOLAP is designed to take advantage of the high performance of MOLAP and the scalability and flexibility of ROLAP.

50
Q

rollup

A

aggregating set of fact values within or across one or more dimensions

51
Q

types of rollup

A

Hierarchical Roll-up
dimensional roll-up

52
Q

Hierarchical Roll-up

A

Hierarchical Roll-up: Aggregates data by moving up a hierarchy within a single dimension, like going from day to week to month.

53
Q

roll down

A

The reverse of roll-up, it breaks down aggregated data into more detailed levels.

54
Q

Drill-across

A

Drill-across
– information from 2 or more connected fact tables is accessed

55
Q

Slicing

A

one of dimensions is set at a particular value

56
Q

Dicing

A

Dicing
– range selection on one or more dimensions