DBMS Flashcards
What is OLTP and OLAP?
OLTP stands for Online Transaction Processing. It is an online transactional system that tracks day to day transactions of an organisation, typically banks, sales and trading. It manages database modification.
OLAP stands for Online Analytical Processing. It is an online retrieving and data analysis system.
Compare the focus of OLTP and OLAP
OLTP focuses on INSERT, UPDATE and DELETE information OLAP focuses on extracting data and analysing them for decision making
Compare data sources for OLTP and OLAP
The data source for OLTP is OLTP itself and its transactions
The data source for OLAP is different OLTP databases
Compare transaction length between OLTP and OLAP
OLTP Has short transactions
OLAP has long transactions
Compare queries in OLTP and OLAP
Queries in OLTP are simpler
Queries in OLAP are more complex
How is data in OLTP and OLAP normalised?
data in OLTP is normalised to 3NF
data in OLAP is not normalised
What is the difference regarding integrity constraints between OLTP and OLAP?
data in OLTP must frequently maintain data integrity constraint
data in OLAP does not get frequently modified. Hence, data integrity is not affected
How does OLTP and OLAP store data?
OLTP: traditional DBMS
OLAP: Data warehouse
Who uses OLTP and OLAP?
OLTP: Clerks and IT Professionals
OLAP: Knowledge workers
Compare the number of users between OLTP and OLAP?
OLTP: Thousands of users
OLAP: Hundreds of users
What is a data warehouse?
A data warehouse is a centralised repository designed to store, organised and manage large amounts of structured data from various sources. It is built to support complex querying and analysis, which are essential for business intelligence activities, decision-making and reporting. The warehouse is structured in a way that data from different domains, such as sales, marketing, HR or finance can be correlated and analysed together.
Key features:
Integration of Data: It integrates data from various sources and formats, ensuring consistency across different data types and systems.
Orthogonal Data Dimensions: It provides a multi-dimensional view of data, which includes dimensions like time (historical and current data) and subject areas (like sales and HR).
Objective: The primary objective is to facilitate reporting and data analysis, providing support for OLAP operations and helping organizations to make data-driven decisions.
Business Intelligence: It is a key component of business intelligence frameworks, offering a comprehensive view of an organization’s data.
Architecture: The architecture of a data warehouse often includes an operational layer of source systems, an integration layer where ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes and data cleansing occur, and the data warehouse itself where the processed data is stored.
Compare DBMS and Data Warehouse
DBMS is tuned for OLTP
Data Warehouse is tuned for OLAP
DBMS is used to run a business
Data Warehouse is used to optimise a business
What are two forms of data loading, which is a key topic of data warehouse
ELT and ETL
What is ETL?
ETL Stands for extract, transform and load. It is a process to move raw data from one or more sources into a data warehouse, where it can be stored, queried and analysed.
Extract: The first step is to pull the data from different sources. The data extracted can be structured or unstructured and may come in different formats.
Transform: The next step is where data is cleansed, enriched and transformed into a suitable format for analysis.
Data cleansing to correct or remove corrupt or inaccurate records.
Data mapping to ensure that data from one source fits into the destination.
Data conversion for standardizing formats, such as dates and numerical values.
Joining or splitting data fields, such as combining first and last names into a full name field or vice versa.
Aggregation or summarization of data, such as calculating total sales.
Load: The last step is to load the transformed data into the DW.
This ETL process is critical to make sure we have a unified system where data is in a suitable format for comprehensive analysis and Business Intelligence.
What are the 3 layers involved in data warehousing?
- Staging Layer: This is where we store raw data from the different sources. It is a temporary storage area used for data processing during the ETL process.
- Integration Layer: This layer is responsible for combining data from the staging layer and organising it into a structure suitable for analysis and query in DW. Data is organised into hierarchical groups or dimensions, and facts (quantitative data) are separated from these dimensions. The schema used in data warehouse typically includes facts and dimensions, which are used to build star or snowflake schemas for organising data into a multidimensional DB structure
- Access layer: This layer is the front end that users interact with to retrieve data from data warehouse. it includes tools for reporting, querying and data analysis.
What are challenges faced during data warehousing?
Data volume (time, spike), dirty data, synchronisation.
What are the 4 main categories of NoSQL?
- Key Value Database: Redis
- Document Database: MongoDB
- Columnar Database: Cassandra
- Graph Database: Neo4j
What is key-value database?
Key-Value Database:
It represents the simplest form of NoSQL databases.
The core structure is a key-value store, which functions like a simple hash table.
Access to the data within the database is exclusively through keys.
Basic operations include:
get: Retrieve the value associated with a key.
put: Set the value for a key.
delete: Remove a key from the database.
It does not support complex query filters, and operations like joins are typically handled outside the database, such as in application code written in languages like Python.
Key-value stores are praised for their efficiency due to their simplicity.
They are designed to be highly distributable, making them suitable for deployment across various systems and locations.
Key-value databases are particularly well-suited for scenarios where quick read and write access to data is required without the overhead of complex data modeling. They are commonly used for session storage, caching, and situations where the dataset can be easily partitioned.
What are unique features of Redis?
- Operations in Redis are performed in memory, which is significantly faster than disk-based storage. Redis also offers data persistency features, which means it can save the updated database to disk at specified intervals to prevent data loss.
- Redis supports rich data structures beyond simple key-value pairs like lists, hashes, sets, bitmaps
- There are many bindings available to many programming languages making Redis available from a wide range of systems. Examples are Python, Java, C, C#
How is data storage done in documents?
They store data in formats like JSON, Binary JSON(BSON), or XML. They are flexible and human readable formats that are widely used for representing complex and nested data structures on the web.
What is a Document in Document Database?
Documents bases do not have a fixed schema like relational databases. Documents are like schema-less. The structure of documents vary from one to the next.
What are features of document?
Nesting: Documents can be nested with complex structures and can also be indexed, which improves search performance.
Faster: They provide potentially faster operations due to less translation overhead compared to converting data into tabular forms as in SQL databases.
Complexity: They offer more straightforward data assembling and disassembling, meaning data can be accessed and manipulated without need for complex joins or transactions
Flexibility: They offer flexibility with changeable data structures, accommodating evolving data models without requiring a predefined schema
In summary, documents in a document database offer a more flexible and dynamic approach to data storage, accommodating varied and evolving data structures without the constraints of a fixed schema. This makes them suitable for applications where data requirements are non-uniform and change frequently.
What are the benefits of Star Schema
- Denormalised: this would mean that we do not need to join many times, as data will come from a few tables, NOT REQUIRING MANY TABLES.
- Simpler queries: simple join logic
- Read intensive since denormalised data
What are the disadvantages of Star Schema
- Data integrity is not enforced: Often, data loading is highly controlled for protection. Normalisation can help us ensure data integrity, but here it is denormalised
- Not flexible
- Not suitable for complex analytics
Snowflake schema
Interesting part is its dimensions can be linked to other smaller dimensions, so a lot of layers, hierarchical. It is like normalisation in data warehouse. Each dimension is decomposed into small tables.
What are the benefits of snowflake schema?
Storage saving, data integrity