DBMS Flashcards
What is DBMS and what is its utility? Explain RDBMS with examples.
Database Management System (DBMS) is a software for storing and retrieving users’ data while considering appropriate security measures. It consists of a group of programs which manipulate the database. The DBMS accepts the request for data from an application and instructs the operating system to provide the specific data. In large systems, a DBMS helps users and other third-party software to store and retrieve data.
RDBMS stands for Relational Database Management System and access and store data more efficiently than DBMS. RDBMS stores data in the form of tables as compared to DBMS which stores data as files. Storing data as rows and columns makes it easier to locate specific values in the database and makes it more efficient as compared to DBMS.
Mention the issues with traditional file-based systems that make DBMS a better choice?
The absence of indexing in a traditional file-based system leaves us with the only option of scanning the full page and hence making the access of content tedious and super slow. The other issue is redundancy and inconsistency as files have many duplicate and redundant data and changing one of them makes all of them inconsistent. Accessing data is harder in traditional file-based systems because data is unorganized in them.
Advantages of DBMS
Data Sharing, Integrity constraints, Controlling redundancy in a database, Data Independence, Provides backup and recovery facility, Data Security
Explain different languages present in DBMS.
DDL(Data Definition Language): It contains commands which are required to define the database.
E.g., CREATE, ALTER, DROP, TRUNCATE, RENAME, etc.
DML(Data Manipulation Language): It contains commands which are required to manipulate the data present in the database.
E.g., SELECT, UPDATE, INSERT, DELETE, etc.
DCL(Data Control Language): It contains commands which are required to deal with the user permissions and controls of the database system.
E.g., GRANT and REVOKE.
TCL(Transaction Control Language): It contains commands which are required to deal with the transaction of the database.
E.g., COMMIT, ROLLBACK, and SAVEPOINT.
What is meant by ACID properties in DBMS?
To ensure the consistency of the database, certain properties are followed by all the
transactions occurring in the system.
Atomicity - either executing the whole query or executing nothing at all.
Consistency - data remains consistent before and after a transaction in a database.
Isolation - each transaction is occurring independently of the others.
Durability - data is not lost in cases of a system failure or restart and is present in the same state as it was before the system failure or restart.
These are those properties that ensure a safe and secure way of sharing data among multiple users.
What is meant by normalization and denormalization?
Normalization is a process of reducing redundancy by organizing the data into multiple tables. Normalization leads to better usage of disk spaces and makes it easier to maintain the integrity of the database.
Denormalization is the reverse process of normalization as it combines the tables which have been normalized into a single table so that data retrieval becomes faster. JOIN operation allows us to create a denormalized form of the data by reversing the normalization.
- What is a lock. Explain the major difference between a shared lock and an exclusive lock during a transaction in a database.
A database lock is a mechanism to protect a shared piece of data from getting updated by two or more database users at the same time. When a single database user or session has acquired a lock then no other database user or session can modify that data until the lock is released.
Shared Lock: is required for reading a data item and many transactions may hold a lock on the same data item in a shared lock. Multiple transactions are allowed to read the data items in a shared lock.
Exclusive lock: An exclusive lock is a lock on any transaction that is about to perform a write operation. This type of lock doesn’t allow more than one transaction and hence prevents any inconsistency in the database.
- Explain different levels of data abstraction in a DBMS.
The process of hiding irrelevant details from users is known as data abstraction. Data abstraction can be divided into 3 levels:
Physical Level: it is the lowest level and is managed by DBMS. This level consists of data storage descriptions and the details of this level are typically hidden from system admins, developers, and users.
Conceptual or Logical level: it is the level on which developers and system admins work and it determines what data is stored in the database and what is the relationship between the data points.
External or View level: it is the level that describes only part of the database and hides the details of the table schema and its physical storage from the users. The result of a query is an example of View level data abstraction. A view is a virtual table created by selecting fields from one or more tables present in the database.
What is Data Warehousing?
The process of collecting, extracting, transforming, and loading data from multiple sources and storing them in one database is known as data warehousing. A data warehouse can be considered as a central repository where data flows from transactional systems and other relational databases and is used for data analytics. A data warehouse comprises a wide variety of an organization’s historical data that supports the decision-making process in an organization.
Explain the difference between a 2-tier and 3-tier architecture in a DBMS.
The 2-tier architecture refers to the client-server architecture in which applications at the client end directly communicate with the database at the server end without any middleware involved.
Example – Contact Management System created using MS-Access or Railway Reservation System, etc.
The 3-tier architecture contains another layer between the client and the server to provide GUI to the users and make the system much more secure and accessible. In this type of architecture, the application present on the client end interacts with an application on the server end which further communicates with the database system.
Example – Designing registration form which contains a text box, label, button or a large website on the Internet, etc.
Explain different types of keys in a database.
Candidate Key: a set of properties that can uniquely identify a table.
Super Key: set of attributes that can uniquely identify a tuple.
Primary Key: any one of the candidate key can be considered as primary key.
Unique Key: similar to the primary key except that primary keys don’t allow NULL values in the column but unique keys allow them.
Alternate Key: All the candidate keys which are not chosen as primary keys.
Foreign Key: The foreign key defines an attribute that can only take the values present in one table common to the attribute present in another table.
First Normalization Form
Every column must have a single value and should be atomic.
Second Normalization Form
The table should be in its 1NF i.e. satisfy all the conditions of 1NF.
Every non-prime attribute of the table should be fully functionally dependent on the primary key.
Third Normalization Form
A table to be in its third normal form should satisfy the following conditions:
The table should be in its 2NF i.e. satisfy all the conditions of 2NF.
There is no transitive functional dependency of one attribute on any attribute in the same table.
BCNF
BCNF stands for Boyce-Codd Normal Form and is an advanced form of 3NF. It is also referred to as 3.5NF for the same reason.
A table to be in its BCNF normal form should satisfy the following conditions:
The table should be in its 3NF i.e. satisfy all the conditions of 3NF.
For every functional dependency of any attribute A on B
(A->B), A should be the super key of the table. It simply implies that A can’t be a non-prime attribute if B is a prime attribute.
SQL VS NoSql
Storage:
SQL: Tables with fixed schemas
NoSQL: Documents, key-value pairs, graphs, wide-columns
Scalability:
SQL: Vertical (hardware upgrade)
NoSQL: Horizontal (adding servers)
Schema:
SQL: Rigid, predefined
NoSQL: Flexible, dynamic
ACID Properties:
SQL: Fully ACID compliant
NoSQL: Usually sacrifices ACID for performance/scalability
Use Cases:
SQL: Complex queries, transactions (banking, ERP)
NoSQL: Big data, real-time data, unstructured data (social media, IoT)
Consistency:
SQL: Strong consistency
NoSQL: Eventually consistent (BASE model)
Query Language:
SQL: Standardized SQL
NoSQL: Database-specific languages
Difference between vertical and horizontal scaling ?
Scaling alters the size of a system. In the scaling process, we either compress or expand the system to meet the expected needs. The scaling operation can be achieved by adding resources to meet the smaller expectation in the current system, or by adding a new system in the existing one, or both.
Vertical scaling keeps your existing infrastructure but adds computing power. Your existing pool of code does not need to change — you simply need to run the same code on machines with better specs. By scaling up, you increase the capacity of a single machine and increase its throughput. Vertical scaling allows data to live on a single node, and scaling spreads the load through CPU and RAM resources for your machines.
Horizontal scaling simply adds more instances of machines without first implementing improvements to existing specifications. By scaling out, you share the processing power and load balancing across multiple machines.
What is sharding
Sharding is a method of splitting and storing a single logical dataset in multiple databases. By distributing the data among multiple machines, a cluster of database systems can store larger dataset and handle additional requests. Sharding is necessary if a dataset is too large to be stored in a single database. Moreover, many sharding strategies allow additional machines to be added. Sharding allows a database cluster to scale along with its data and traffic growth.
Indexing in DBMS
Indexing is a way to optimise the performance of a database by minimising the number of disk accesses required when a query is processed. It is a data structure technique which is used to quickly locate and access the data in a database.
Conflict Serializability in DBMS
Serializability is a concept that helps us to check which schedules are serializable. A serializable schedule is the one that always leaves the database in consistent state.
A schedule is called conflict serializability if after swapping of non-conflicting operations, it can transform into a serial schedule. The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule.
What is CCP ? (Concurrency Control Protocols)
Concurrency Control is the management procedure that is required for controlling concurrent execution of the operations that take place on a database.
The concurrency control protocols ensure the atomicity, consistency, isolation, durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorised as:
Lock Based Concurrency Control Protocol
Timestamp Concurrency Control Protocol
Validation Based Concurrency Control Protocol