One - three Flashcards
DABASE MANAGEMENT SYSTEM (DBMS)
A database management system (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. In a sense, a database resembles a very well-organized electronic filing cabinet in which powerful software (the DBMS) helps manage the cabinet’s contents.
DBMS is used to define, manage, and process databases. A DBMS does the following:
It allows modification of data.
It allows the creation of new databases and their data structures
You need to retrieve that data from the database using queries that are also allowed by a DBMS
It allows storage of data over a long period so that whenever there is data in the DBMS it is not lost easily, it just stores it for a long period.
It enables recovery in times of failure. Whenever there is some system failure or an error, and data is lost, it enables recovery of the data.
It controls access to users; you can define who can access the database, and who cannot.
THE ENTITY RELATIONSHIP MODEL
Business rules set the stage for the proper identification of how the database will behave, the constraints and working environment.
The conceptual models such as the ERM can be used to understand and design the data requirements of an organization. Therefore, the ERM is independent of the database type. Conceptual models are used in the conceptual design of databases, while relational models are used in the logical design of databases.
The Entity Relationship Model represents the conceptual database as viewed by the end user. This model depict the database’s main components: entities, attributes, and relationships.
ENTITY RELATIONSHIP DIAGRAM (ERD)
An Entity Relationship Diagram (ERD) is a data modelling technique that graphically illustrates an information system’s entities and the relationships between those entities. An ERD is a conceptual and representational model of data used to represent the entity framework infrastructure.
The elements of an ERD are:
Entities
Attributes
Relationships
- Techopedia (2017)
A single unique object in the real world that is being mastered. Examples of an entity are a single person, single product, or single organization. Because an entity represents a real-world object, the words entity and object are often used interchangeably.
Attributes
A characteristic or trait of an entity type that describes the entity, for example, the PERSON entity type has the date of birth (dob) attribute or a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, customer address and possible customer credit limit. An entity will have zero or more attributes, and each of those attributes apply only to that entity. Attributes are the equivalent of fields in file systems.
Relationships
Describes an association among entities. For example, a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent.
TYPES OF KEYS
In the relational model, keys are important because they are used to ensure that each row in a table is uniquely identifiable. They are also used to establish relationships among tables and to ensure the integrity of the data. A key consists of one or more attributes that determine other attributes.
Primary Key
Surrogate Key
Alternate Key
Super Key
Composite Key
Candidate Key
Foreign Key
JOINS
A NATURAL JOIN links tables by selecting only the rows with common values in their common attribute(s).
Natural join is normally just referred to as JOIN in formal treatments. JOIN is denoted by the symbol ⋈. The JOIN of the STUDENT and MODULE relations would be written as follows:
STUDENT ⋈ MODULE
An INNER JOIN only returns matched records from the tables that are being joined.
In an OUTER JOIN, matched pairs would be retained, and any unmatched values in the other table would be left NULL.
It is an easy mistake to think that an outer join is the opposite of an inner join. However, it is more accurate to think of an outer join as an “inner join plus.” The OUTER JOIN still returns all of the matched records that the inner join returns, plus it returns the unmatched records from one of the tables
A LEFT OUTER JOIN yields all of the rows in the TABLE A, including those that do not have a matching value in the TABLE B.
DATA DEFINITION LANGUAGE (DDL)
What is DDL?
DDL (Data Definition Language) is a set of SQL commands used to define, manage, and restructure the physical database structure. These commands are immediate, permanent, and focus on the design of database objects (tables, schemas, views, etc.).
Key Features
Defines Structure: Creates and modifies the database structure.
Immediate Effect: Changes are auto-committed and cannot be rolled back.
Design Focused: Deals with tables, columns, constraints, and other database objects.
DATA MANUPULATION LANGUAGE (DML)
This is the set of commands where manipulation of values, objects, and figures within the table undergo changes, mainly when focusing on mistakes on the datasets. DML is often done when the table has been created, and a database designed by DDL commands.
Besides, these commands are among the most beneficial, especially when making changes in the database management systems.
DATA CONTROL LANGUAGE (DCL)
A data control command is an SQL command which allows users to access the information stored within the database system quickly. Besides, DCL is also critical handling matters related to data control, including controlling the user to gain access to the database system accordingly.
Data control commands are used to control access to data objects, such as giving a one user permission to only view the STUDENT table, and giving another use permission to change the data in the STUDENT table.
TRANSACTION CONTROL LANGUAGE (TCL)
Each individual transaction must display Atomicity, Consistency, Isolation, and Durability.
These four properties are sometimes referred to as the ACID test.
The DML commands in SQL are executed within the context of a transaction, which is a logical unit of work composed of one or more SQL statements, as defined by business rules. SQL provides commands to control the processing of these statements an indivisible unit of work.
DATABASE LYFE CYCLE (DBLC)
A cycle that traces the history of a database within an information system.
The cycle is divided into six phases as below:
. Database Initial Study
. Database Design
. Implementation & Loading
. Testing & Evaluation
. Operations
. Maintenance & Evaluation
WHAT IS NORMALISATION?
All rows are distinct
The relational model is built on sets, and a set cannot have duplicate elements. This implies that all rows, and all query results, should be unique and simple to relate to. Simplicity is achieved by removing redundancy and non-simple domains through normalization with primary keys. In practice those concepts translate into two rules:
Do not store lists of values
Do not repeat a column
Without Normalization in SQL, we may face many issues such as:
Insertion anomaly: It occurs when we cannot insert data to the table without the presence of another attribute
Update anomaly: It is a data inconsistency that results from data redundancy and a partial update of data.
Deletion Anomaly: It occurs when certain attributes are lost because of the deletion of other attributes.
WHY WE NEED NORMALISATION?
Eliminate data redundancies (and therefore use less space)
Make it easier to make changes to data, and avoid anomalies when doing so
Make referential integrity constraints easier to enforce
Produce an easily comprehensible structure that closely resembles the situation the data represents, and allows for growth
NORMALISATION FORMS
1st Normal Form (1NF)
In this Normal Form, we tackle the problem of atomicity. Here atomicity means values in the table should not be further divided. In simple terms, a single cell cannot hold multiple values. If a table contains a composite or multi-valued attribute, it violates the 1NF. Each table must have a primary key, i.e., a minimal set of attributes that can uniquely identify a record. Eliminate repeating groups (categories of data that would seem to be required a different number of times on different records) by defining keyed and non-keyed attributes appropriately. Atomicity: Each attribute must contain a single value, not a set of values.
2nd Normal Form (2NF)
The first condition in the 2NF is that the table has to be in 1NF. The table also should not contain partial dependency. Here partial dependency means the proper subset of candidate key determines a non-prime attribute. In addition, if a table has a composite key, all attributes must be related to the whole key. And, data that is redundantly duplicated across multiple rows of a table is moved out to a separate table.
3rd Normal Form (3NF)
Data stored in a table must be dependent only on the primary key and not on any other field in the table. The database must meet all the requirements of the 2NF. Any field that is dependent not only on the primary key but also on another field is moved out to a separate table. That means non-prime attributes (which doesn’t form a candidate key) should not be dependent on other non-prime attributes in a given table. So a transitive dependency is a functional dependency in which A → C (A determines C) indirectly, by virtue of A → B and B → C (where it is not the case that B → A)
DE-NORMALIZATION
Denormalization is a database optimization technique where we add redundant data in the database to get rid of the complex join operations. This is done to speed up database access speed. Denormalization is done after normalization for improving the performance of the database. The data from one table is included in another table to reduce the number of joins in the query and hence helps in speeding up the performance.
When we normalize tables, we break them into multiple smaller tables. So when we want to retrieve data from multiple tables, we need to perform some kind of join operation on them. In that case, we use the denormalization technique that eliminates the drawback of normalization.
Advantages
Enhance Query Performance
Make database more convenient to manage
Facilitate and accelerate reporting
Disadvantages
It takes large storage due to data redundancy.
It makes it expensive to updates and inserts data in a table.
It makes update and inserts code harder to write.
WHAT IS A TRANSACTION ANALYSIS (TA)
In order to understand the functionality of the DBMS we need to understand the functionality of the transactions hence, performance criteria of transactions must be analysed and determine what are the ‘important’ transactions, which are secondary transactions. Types, sizes and times of transactions may affect the performance of the system. It is critical that these are identified to ensure there are enough resources available to meet the transactional demand.
The transaction concept is one that is closely related to operation abstractions. The transaction concept suggests that instead of allowing arbitrary operations on a database, these operations must be structured into sets of actions such that when each set is executed, the integrity of the database is maintained.
TYPES OF DATABASE SYSTEMS
The database systems can be classified into three main categories
According to the number of users
According to the type of use
According to database site locations
DBs ACCORDING TO THE NUMBER OF USERS
According to the number of users that it can be further subdivided into two categories
Single-user database systems - the database reside on a PC–on the hard disk. All the applications run on the same PC and directly access the database. In single user database systems, the application is the DBMS. A single user accesses the applications and the business rules are enforced in the applications running on PC.
Multiuser database systems - many PC’s are connected through a Local Area Network (LAN) and a file server stores a copy of the database files. Each PC on the LAN is given a volume name on the file server. Applications run on each PC that is connected to the LAN and access the same set of files on the file server. The application is the DBMS and each user runs a copy of the same application and accesses the same files. The applications must handle the concurrency control and the business rules are enforced in the application.
DBs ACCORDING TO THE TYPE OF USE
According to the type of use databases can be further subdivided into three categories
Production or Transactional Database Systems
Used for management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores and orders for items. The transactional database systems are used for purchases on credit cards and generation of monthly statements. They are also used in Banks for customer information, accounts, loans and banking transactions. Those databases systems are known as Online Transaction Processing (OLTP) systems
Data Warehouses
A special built dimensional database management system designed specifically to meet the transaction processing systems. It can be loosely defined as any centralised data repository which can be queried for business benefit. Data Warehouses are not meant to be used for live transactions and data are stored historically. Those databases systems are known as Online Analytical Processing (OLAP) systems.
Decision Support Database Systems
Interactive, computer-based systems that aid users in judgement and choice activities. They provide data storage and retrieval but enhance the traditional information access and retrieval functions with support for model building and model based reasoning. They support framing, modelling and problem solving. Typical application areas of DSS’s are management and planning in business, health care, military and any area in which management will encounter complex decision situations. They can be either OLTP or OLAP systems.
DBs ACCORDING TO THE DATABASE SITE LOCATION (1)
1 Centralized Database System
What it is: A single processor manages the database, storage, and peripherals.
Where it’s used:
Small enterprises (on a PC).
Large enterprises (on a mainframe computer).
How it works:
Applications and data run on the same machine.
Users access via simple terminals (no processing power).
Text-mode screens for interaction; business rules handled by the application.
2 Parallel Database Systems
What it is: A system where multiple processors (tightly or loosely coupled) work together to handle database tasks.
Key Features:
Links smaller machines for higher throughput.
Offers greater scalability and reliability compared to single-processor systems.
Use Cases:
Querying extremely large databases.
Handling a high number of transactions per second.
3 Distributed Database Systems
What it is: A system where data is spread across multiple databases managed by various DBMSs on different machines.
Key Features:
Machines have their own data and applications.
Communication happens via networks.
Machines act as both servers and clients.
Types:
Homogeneous: Same DBMS and schema across all nodes.
Heterogeneous: Different DBMSs and schemas across nodes.
4 Client/Server Database Systems
What it is: A database architecture where a server manages the DBMS, and clients access it via a network.
Key Features:
Clients are personal computers connected to the server.
Evolved from centralized systems due to cheaper hardware and increased PC usage.
Benefits:
Efficient use of personal computers.
Decentralized access to data.
WHY WE NEED TRANSACTION ANALYSIS IN DATABASES
- Enhanced Logical Design
Helps identify which tables and attributes will be impacted by specific transactions.
Aids in estimating the required hardware and storage space. - Performance Criteria Identification
Highlights transactions that:
Run frequently and significantly affect performance.
Are critical to business operations.
Identifies peak load times (e.g., daily or weekly high-demand periods) to anticipate and address performance bottlenecks. - High-Level Transaction Functionality
Helps structure key functionalities, including:
Selecting appropriate file storage organizations and indexes.
Identifying regularly updated attributes.
Estimating the number of queries made by the organization.
Understanding the search criteria used in queries.
Designing user views based on requirements.
Implementing necessary security measures.
Purpose:
Transaction analysis ensures a well-designed database system by addressing logical structure, performance needs, and functionality, ultimately leading to optimized database operations and security.
PERFORMING TRANSACTION ANALYSIS
Often not possible to analyse all transactions, so investigate most ‘important’ ones. To help identify these can use:
Transaction/relation cross-reference matrix, showing relations that each transaction accesses, and/or transaction usage map, indicating which relations are potentially heavily used.
Estimating the number of transactions per second/hour/day (Transaction Volume) using a Transaction Analysis Form
We use CRUD in multiple programming concepts to determine what is happening to data or who can do what to the data:
C CREATE
R READ
U UPDATE
D DELETE
I INSERT
S SELECT
U UPDATE
D DELETE
PURPOSE OF TAs and TVs
Design User Views based on the most often used queries
Design and apply Indexes for each table
Define Underlying Storage Structures for each table
Estimate hardware and architecture of the database
Estimate total Disk Space Requirement for the database
Design the Security Requirements for the database
TRANSACTIONS (ACID)
A transaction is a sequence of operations performed (using one or more SQL statements) on a database as a single logical unit of work. The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all rolled back (undone from the database). A database transaction must be atomic, consistent, isolated and durable (ACID).
Atomicity : Either all the changes are made or none. If any of the statements in the batch fail, all the changes, if any, must be reversed.
Consistency : All the data involved in an operation must be left in a consistent state upon completion or rollback of the transaction; database integrity cannot be compromised.
Isolation : One transaction should not be aware of the modifications made to the data by any other transaction unless it was committed to the database. Different isolation levels can be set to modify this default behaviour.
Durability : The results of a transaction that has been successfully committed to the database stay in the database.
CONCURRENCY TRANSACTIONS PROBLEMS
- The Lost Update Problem
What happens:
When two transactions update the same data simultaneously, the update of one transaction may overwrite the update of the other.
Why it’s a problem:
One update is “lost” because the database doesn’t maintain a proper sequence or isolation between the operations.
Example:
Transaction A: Reads a value X = 10. Updates it to X = X + 5.
Transaction B: Reads the same value X = 10. Updates it to X = X - 3.
Both transactions complete, but only one update reflects in the database (e.g., X = 7 or X = 15), losing the effect of the other. - The Uncommitted Dependency Problem (Dirty Read)
What happens:
A transaction reads data that another transaction has modified but not yet committed. If the second transaction rolls back, the first transaction has read invalid data.
Why it’s a problem:
The first transaction may use uncommitted (and potentially invalid) data, leading to incorrect computations or decisions.
Example:
Transaction A: Updates X = 50 but hasn’t committed.
Transaction B: Reads X = 50 (dirty read) and uses this value for further calculations.
If Transaction A rolls back and resets X = 10, Transaction B’s calculations are now based on an invalid state. - The Inconsistent Analysis Problem
What happens:
A transaction reads multiple values from the database, but during its execution, another transaction updates some of those values. This causes the first transaction to work with a mix of old and new data.
Why it’s a problem:
The result of the first transaction becomes inconsistent, leading to errors in summarizations or calculations.
Example:
Transaction A: Totals balances for all accounts in a database.
Transaction B: Updates account balances while Transaction A is still summing them.
The result of Transaction A is incorrect because it includes a combination of pre-update and post-update values.
TRANSACTIONS CONTROL TECHNIQUES
- Timestamping
What it is: A protocol that assigns each transaction a timestamp to maintain order. Older transactions (with smaller timestamps) get priority during conflicts.
How it works:
Transactions are executed in timestamp order.
Newer transactions may be delayed or rolled back to ensure consistency. - Timestamp
What it is: A unique identifier assigned by the DBMS to indicate the starting time of a transaction.
Purpose: Ensures correct transaction ordering and avoids conflicts. - Serializability
What it is: Ensures that the execution of transactions is equivalent to a serial order (one-at-a-time execution).
Rules for Conflicts:
No conflict if transactions only read or access different data items.
Conflict occurs when one transaction writes and another reads or writes the same data.
Execution order matters in such conflicts to ensure consistency. - Locking
What it is: A technique to control concurrent access to data by restricting other transactions while one is using it.
Types of Locks:
Shared Lock (S-Lock): Allows multiple transactions to read the data but not update it.
Exclusive Lock (X-Lock): Allows a single transaction to read and update the data.
Purpose: Prevents incorrect results and ensures safe concurrency.
TRANSACTIONS CONTROL LOCK ISSUES
Deadlocks - a situation arises when two (or more) sessions are waiting to acquire a lock on a shared resource, and none of them can proceed because a second session also has a lock on some other resource that is required by the first session.
TRANSACTIONS CONTROL TECHNIQUES USAGE
Pessimistic Concurrency Control
Locking schemes (lock‐based database scheduler)
Full serialization of transactions
Optimistic Concurrency Control (OCC)
Optimistic execution of operations, check of conflicts (validation)
Optimistic and timestamp‐based database schedulers
Mixed Concurrency Control (e.g., PostgreSQL)
Combines locking and OCC
Might return synchronization errors