Databases Flashcards

1
Q

What is a transaction?

A

Transaction: A logical unit of work that must be entirely completed or aborted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

List and explain the transaction properties

A
  • Atomicity – Ensures a transaction is treated as a single, logical unit of work. Either all of its operations are completed, or none are. If any part of the transaction fails, the entire transaction is rolled back.
  • Consistency - A transaction should bring the database from one consistent state to another. It means that the integrity constraints and business rules must be satisfied before and after the transaction.
  • Isolation - A database transaction property in which a data item used by one transaction is not available to other transactions until the first one ends.
  • Durability - The transaction property that ensures that once transaction changes are done and committed, they cannot be undone or lost, even in the event of a system failure.
  • Serializability - Means that a series of concurrent transactions will yield the same result as if they were executed in Serie (one after another).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the function and use of a transaction log

A

Transaction log: Records all operations that modify the database.
The database transaction log plays a crucial role in maintaining database concurrency control and integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is concurrency control and what is it’s objective?

A

Concurrency control: Coordination of the simultaneous execution of transactions in a multiuser database system
Objective: Ensure serialisability of transactions to guarantee data integrity and consistency in a database management system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List and explain data integrity and consistency problems (3)

A
  • Lost updates: Occurs in two concurrent transactions when: Same data element is updated and one of the updates is lost.
  • Uncommitted data: Occurs when 2 transactions are executed concurrently, first transaction is rolled-back after the second transaction has already accessed uncommitted data.
  • Inconsistent retrievals: Occurs when a transaction accesses data before and after one or more other transactions finish working with such data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is locking methods

A

Locking methods: Involves placing locks on database objects (e.g., rows, tables) to prevent concurrent transactions from accessing or modifying the same data simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain different lock types (3)

A
  1. Binary lock: Has only two states, locked (1) and unlocked (0). If a data item is locked, no other transaction can use that data item.
  2. Exclusive lock: Issued when one or more transactions must update (WRITE) a data item. Therefore, an exclusive lock is issued only when a transaction must WRITE (update) a data item and no locks (not shared or exclusive) are currently held on that data item by any other transaction.
  3. Shared lock: Issued when a transaction must read data from the database and no exclusive locks are held on the data to be read.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain 2PL

A

2PL: A set of rules that manage how transactions acquire and release locks to ensure serializability, though it doesn’t prevent deadlocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the 2PL phases

A
  1. Growing phase: Transaction acquires all required locks without unlocking any data
  2. Shrinking phase: - Transaction releases all locks and cannot obtain any new lock
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the write-ahead log protocol

A

In concurrency control, a process that ensures transaction logs are written to permanent storage before any database data is actually updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discuss basic database performance tuning concepts

A

SQL performance tuning – on the client side – that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end.
DBMS performance tuning – on the server side – that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible, while making optimum use of existing resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain how the DBMS processes SQL queries

A
  1. Parsing - The DBMS parses the SQL query and chooses the most efficient access/execution plan.
  2. Execution - The DBMS executes the SQL query using the chosen execution plan.
  3. Fetching - The DBMS fetches the data and sends the result set back to the client.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the role of indexes in speeding up data access

A

Indexes: Used to speed up data access by creating ordered sets of values that contain index keys and pointers to actual table rows.
Data sparsity: It’s the variety of unique values in a column. Helps determine whether to use an index.
Low sparsity columns – Less suitable for indexing
High sparsity columns – Benefits from indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Differentiate between centralised and distributed DBMS

A
  • Centralized DBMS: Data is stored in a single location, allowing for straightforward management and control.
  • Distributed DBMS (DDBMS): Data is distributed across multiple locations, which may enhance reliability and performance by reducing the risk of a single point of failure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a fully distributed DBMS?

A
  • A fully distributed system spreads both data and processing tasks across multiple sites, optimizing resource utilization and improving access speed for users.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Differentiate between heterogeneous and homogeneous DDBMS

A
  • Homogeneous DDBMS: Integrates only one type of database management system (DBMS) across the network, ensuring consistency in data management practices.
  • Heterogeneous DDBMS: Integrates multiple types of DBMSs and platforms, allowing for diverse data management solutions within a single distributed framework.
17
Q

Differentiate between distributed processing and distributed database

A

Distributed processing: Sharing the logical processing of a database over 2+ sites connected by a network.
Distributed database: A logically related database that is stored in 2+ physically independent sites.

18
Q

What is DDBMS transparency features?

A

DDBMS transparency features: Make all the system’s complexities hidden to the end user. So the end user can continue working as if it were centralized, without having to be concerned about the decentralized nature of the DB.

19
Q

List and explain transparency features (5)

A
  1. Distribution transparency: Allows a distributed DB to be treated as a single logical DB. (The user does not need to know how the data is partitioned, geographically dispersed, or replicated among sites).
  2. Transaction transparency: Allows a transaction to update data at more than one network site. (Ensures that a transaction will either be entirely completed or aborted to maintain DB integrity).
  3. Failure transparency: Ensures that the system will continue to operate in the event of a node or network failure.
  4. Performance transparency: Allows the system to perform as if it were a centralized DBMS. (It will not suffer performance degradation and will find the most cost-effective path to access remote data).
  5. Heterogeneity transparency: Allows the integration of several different local DBMSs (relational, network, hierarchical) under a common or global schema
20
Q

Discuss data fragmentation (definition and 3 strategies)

A
  1. Horizontal Fragmentation: Divides a relation into subsets of tuples (rows). Each fragment contains unique rows and is stored at a different node.
  2. Vertical Fragmentation: Divides a relation into subsets of attributes (columns). Each fragment contains unique columns, except for the key column, and is stored at a different node.
  3. Mixed Fragmentation: Combines both horizontal and vertical fragmentation strategies.
21
Q

Discuss data replication (definition and 3 strategies)

A
  1. Fully replicated databases: Store copies of all data at multiple sites, though this can be impractical due to system overhead.
  2. Partially replicated databases: Replicate only some data fragments at multiple sites, a more manageable approach.
  3. Unreplicated databases: Store each fragment at only one site, with no duplicates.
22
Q

Discuss data allocation (definition and 3 strategies)

A
  1. Centralized data allocation: Entire database stored at one site
  2. Partitioned data allocation: Database is divided into two or more disjoined fragments and stored at two or more sites
  3. Replicated data allocation: Copies of one or more database fragments are stored at several sites
23
Q

Differentiate between operational data and decision support data (3 categories and differences)

A
  1. Time span: Operational data (OD) covers a short time frame, while decision support data (DSD) tends to cover a longer time frame.
  2. Granularity: Describes the levels of aggregation. Decision support data must be presented at various levels of aggregation, from highly summarized (DSD) to nearly atomic (OD).
  3. Dimensionality: Operational data focuses on representing individual transactions, rather than the effect of those transactions over time (DSD). Decision support data includes multiple dimensions to show how data relates across those dimensions.
24
Q

Using a star schema in a data warehouse environment

A

The star schema consists of a central fact table that stores transactional or measured data. Surrounding the fact table are one or more dimension tables, which are linked to the fact table through ER (Entity-Relationship) connections. The fact table contains the measurements or numerical data, while the dimension tables describe these facts by answering key questions such as “Who,” “What,” “Where,” “When,” and “How.”

25
Q

What is OLAP?

A

Online Analytical Processing (OLAP): Provides tools for multidimensional data analysis, supporting decision
making, business modelling, and operations research.

26
Q

List and explain the OLAP characteristics

A
  1. Multidimensional Data Analysis: Advanced data presentation, aggregation, computation, and modelling
    functions for business analysis.
  2. Advanced Database Support: Provides access to various DBMSs, aggregated data, and rapid query
    responses with advanced navigation.
  3. User-Friendly Interfaces: Offers an analytical interface to simplify and accelerate decision-making
    processes.
27
Q

Explain the role of Big Data

A

Big Data: Refers to datasets characterized by volume, velocity, and variety, making them unsuitable for traditional relational database management systems. It involves managing large datasets that align with these 3V’s.
Big data allows businesses to generate and track continuous data streams, enabling real-time processing and insights.

28
Q

List 3V’s

A
  • Volume: The amount of data being generated and stored
  • Velocity: Speed at which data is generated and needs to be processed
  • Variety: The diverse formats and types of data collected.
29
Q

Traditional Relational Database Storage vs. Big Data

A

Table (data structure, scalability, performance)

30
Q

What is the hadoop framework

A

Hadoop: Java-based framework designed for the distributed storage and processing of large data sets across clusters of computers. It is not a database, instead it is a framework that enables Big Data analytics.
It consists of two main components: Hadoop Distributed File System (HDFS) for storage and MapReduce for processing.

31
Q

How data is stored in Hadoop

A

HDFS (Hadoop Distributed File System): Designed to store large files across multiple machines, providing high availability and fault tolerance.
* Data is divided into fixed-size blocks (typically 128 MB or 256 MB) and replicated across several nodes to ensure durability. For example, each block may be stored on three different nodes.
* The NameNode manages the metadata (information about file structure and locations) while DataNodes store the actual data blocks.

32
Q

How data is processed in Hadoop

A

MapReduce: A programming model and processing engine in Hadoop that allows for the distributed processing of large datasets.
2 phases:
* Map phase: Involves dividing the data into smaller chunks, which are processed in parallel. Each Mapper processes the input data and produces intermediate key-value pairs.
* Reduce phase: Aggregates the results from the Mappers to produce the final output. This processing model allows for efficient handling of large datasets by leveraging distributed computing.

33
Q

Four major approaches of the NoSQL data model

A

Table: Key-value stores, document stores, column-family stores, graph databases