TB2 Flashcards
WHAT IS A RELATIONAL DATABASE?
A Relational Database (RDB) is a collective set of multiple data sets organized by tables, records and columns. RDBs establish a well-defined relationship between database tables. Tables communicate and share information, which facilitates data searchability, organization and reporting.
RDBs use Structured Query Language (SQL), which is a standard user application that provides an easy programming interface for database interaction.
Core characteristics:
.Relational databases organizes data into one or more tables.
.Each row (tuple) in the table represents a unique record
.Each column represents a specific attribute or field of that record.
.The relational structure allows us to identify and access data in relation to another piece of data in the database.
WHAT IS A DATABASE MANAGEMENT SYSTEM?
A DBMS is a complex piece of software specifically designed to interact with end-users, applications, and the database itself to capture and analyse data. It serves as an interface between the database and end users or application programs, ensuring that data is consistently organized and remains easily accessible.
Core characteristics
It provides a systematic approach to store, manipulate, and access data efficiently, ensuring data integrity and security.
A DBMS allows users to define the structure of the database, specify relationships between data elements, and perform operations such as adding, updating, and deleting data.
It acts as an interface between users and the database, facilitating the storage and retrieval of data while maintaining data consistency and reducing data redundancy.
THE ADVANTAGES OF THE RDBs
Database management systems (DBMS) have evolved over the years to efficiently store and retrieve data. Relational DBMS offer data consistency and integrity, scalability, and ease of use.
.Data Integrity and ACID Compliance: Relational databases follow ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliability in every transaction.
.Data Structure: The tabular structure makes it easier to understand the schema and relationship between different entities.
.Sophisticated Querying: SQL (Structured Query Language) enables complex queries and operations, offering greater flexibility.
.Mature and Well-Supported: Relational databases have been around for decades, so there’s a wealth of community and vendor support.
.Normalization: Relational databases allow data normalization, which minimizes data redundancy and enhances data integrity.
.Flexibility in Indexing: You can index columns in a relational database to improve data retrieval speed.
.Strong Security: Features like role-based access control are often built-in, offering robust security options.
.Multi-users: Enable data to be easily accessed and shared across multiple applications and users.
THE DISADVANTAGES OF THE RDBs
While powerful, relational databases also have their drawbacks. They can have performance issues with very large datasets, complexity in creating and managing, and cost of DBMS software and hardware.
.Scalability: While they can be scaled vertically by adding more power to the individual server, they are generally more challenging to scale horizontally (across multiple servers).
.Complexity: The tabular structure and relationships can become complex as more tables and relationships are added.
.Cost: High-quality RDBMS systems can be expensive in terms of licensing and hardware requirements.
.Performance: For read-heavy and write-heavy operations, particularly with large datasets, NoSQL databases like MongoDB or Cassandra may offer better performance.
.Rigidity: Changing the schema can be difficult and time-consuming, especially for large databases and applications that are already in production.
.Overhead: Features that ensure data integrity and ACID compliance can introduce computational overhead, which may not be suitable for all kinds of applications (e.g., real-time analytics).
COMPONENTS OF A RELATIONAL DATABASE
Tables: The primary building blocks, tables store data in a structured format. Each table holds data for a specific object, like “Customers” or “Orders.”
Columns: Each table is made up of columns, (aka fields). These define the type of data that can be stored, such as integers, text, or dates.
Rows or Records: Each row in a table is a record. It is a single instance that contains data for each field in the table.
Primary Key (PK): This is a unique identifier for a record in a table. No two records can have the same primary key, ensuring data integrity and enabling quick data retrieval.
Foreign Key (FK): Used to establish relationships between tables, a foreign key in one table refers to the primary key of another table. This helps maintain referential integrity.
Index: This is a data structure that improves the speed of data retrieval operations. Indexes can be created on one or multiple columns.
Schema: The blueprint of the database, the schema defines the structure, including tables, fields, and the relationships between them.
Constraints: Rules applied to columns to maintain data integrity, such as “NOT NULL,” “UNIQUE,” or “CHECK” constraints.
Data Dictionary: Also known as the system catalogue, this holds metadata like table definitions, field data types, and other properties of the database.
Views: These are virtual tables that provide a layer of security and abstraction, presenting data in a specific way without changing the underlying data.
Triggers and Stored Procedures: These are sets of SQL statements that are stored and can be executed on the database server to enforce business rules or automate tasks.
WHAT IS AN ENTITY?
An ENTITY can be defined as a thing or an object that is distinguishable from all other objects.
The entities might include
Customers
Products
Orders
etc.
Entities have relationships with each other.
RELATIONSHIP TYPES (CARDINALITIES)
One-to-One (1:1): Each A ↔️ One B
Example: Person ↔️ Passport
One-to-Many (1:M) Each A ↔️ Many B
Example: Teacher ↔️ Students
Many-to-One (M:1): Many A ↔️ One B
Example: Employees ↔️ Department
Many-to-Many (M:M) Many A ↔️ Many B
Example: Students ↔️ Courses
Zero-to-One (0:1): A ↔️ None or One B
Example: Person ↔️ Parking Spot
Zero-to-Many (0:M) A ↔️ None or Many B
Example: Customer ↔️ Orders
One-to-Zero or Many (1:0..*): Each A ↔️ None or Many B
Example: Product ↔️ Reviews
ER Diagram Symbols:
1:1: Single tick on both ends.
1:M: Single tick on one end, crow’s foot on the other.
M:N: Crow’s foot on both ends.
Purpose: Define relationships, ensure data integrity, and minimize redundancy.
TYPES OF KEYS
Primary Key
Definition: A unique identifier for each record in a table.
Example: student_id in a Students table.
Foreign Key
Definition: A field in one table that uniquely identifies a row of another table, creating a relationship between the two tables.
Example: department_id in an Employees table referencing department_id in a Departments table.
Composite Key
Definition: A key that consists of two or more columns to uniquely identify a record.
Example: (student_id, course_id) in an Enrollments table.
Candidate Key
Definition: A set of one or more columns that can uniquely identify a record. There can be multiple candidate keys in a table.
Example: email or student_id in a Students table.
Super Key
Definition: A set of one or more columns that can uniquely identify a record. A super key can have additional attributes that are not necessary for unique identification.
Example: (student_id, email) in a Students table.
Alternate Key
Definition: A candidate key that is not the primary key.
Example: If student_id is the primary key, email can be an alternate key in a Students table.
Surrogate Key
Definition: An artificial key that is used as a unique identifier for a table. It is typically an auto-incremented number.
Example: id in a Users table, where id is an auto-incremented integer.
ATTRIBUTES
.Attributes are the properties or characteristics of an entity in a database.
.Each attribute represents a specific piece of data, such as name, age, or address.
.They are represented by columns in a database table.
.An attribute’s value can change over time, reflecting the current state of the entity.
.They are essential for defining the structure of data in a database.
DATA TYPEs
.Data types define the nature of the data that can be stored and manipulated.
.Common data types include INTEGER (INT), FLOAT, VARCHAR, CHAR, BOOLEAN, DATE, TIME, etc.
.Each data type has a specific range of values it can represent.
.Data types are important for ensuring data integrity in a database.
.They can influence the kind of operations that can be performed on the data. For instance, arithmetic operations can be performed on numerical data types (integers), but not on string (characters) or Boolean data types.
.In databases, the chosen data type can also impact the storage space required for each attribute.
A Composite Key
A Composite Key in a database is a type key that consists of two or more attributes or columns to uniquely identify a record in a table. It is used when a single attribute is not sufficient to guarantee uniqueness. The combination of these attributes can ensure the uniqueness of each record in the table, fulfilling the role of a primary key when required.
NORMALISATION
Database NORMALISATION is a systematic approach to designing databases in a way that reduces data redundancy and improves data integrity. It involves dividing a database into two or more tables and defining relationships between those tables to eliminate duplicate data, simplify queries, and ensure that logical dependencies are maintained correctly. The process is typically conducted in stages, known as “normal forms”, each with a specific set of requirements.
1NF – FIRST NORMAL FORM
The FIRST NORMAL FORM (1NF) is a fundamental level of database normalisation that sets the basic rules for an organised database:
Each table cell should contain a single value.
Each record needs to be unique, that is, no duplicate rows/columns are allowed.
Entries in a column (attribute) are of the same kind and follow a specific data type (e.g. do not have CHARACTER type in ROW1 and NUMERIC in ROW2).
Order does not matter* - although are good practice aspects (we will discuss those)
The primary goal of 1NF is to eliminate duplicative data from the same table and create separate columns or rows for each group of related data, establishing a unique identifier (PKs), for each record.
2NF – SECOND NORMAL FORM
The SECOND NORMAL FORM (2NF) is the next stage of database normalisation, which builds on the rules set by the 1NF. A table is in 2NF if:
It is already in 1NF
All non-key attributes (i.e., columns that are not part of the PK) are fully functionally dependent on the PK. This means they must depend on the entire PK for their existence and uniqueness, not just part of it.
This process helps to reduce redundancy in the database by ensuring that all non-key attributes of the table relate directly to the PK. It’s particularly applicable when the PK is a Composite Key (CK).
3NF – THIRD NORMAL FORM
The THIRD NORMAL FORM (3NF) is the next level of database normalisation, extending the rules of 2NF. A table is in 3NF if:
It is already in 2NF
All the attributes (columns) are functionally dependent on solely the PK and not on any other non-key attributes or sets of attributes (i.e., there are no transitive dependencies).
This form ensures that each non-key column of a table is not only dependent on the PK but is also directly dependent on it. It helps to preserve data consistency and eliminate redundancy caused by transitive dependencies.
DE-NORMALISATION
DENORMALISATION is a database optimisation technique in which we add redundant data or group data in order to speed up complex queries. In contrast to normalisation, which aims to minimize redundancy and keep data consistent by breaking down tables, denormalisation does the opposite by combining tables and adding redundant data. The main goal is to reduce the complexity of queries and improve the performance of read-heavy databases at the expense of potential inconsistencies in data and increased storage.
DATABASE CONSTRAINTS
Database constraints are a set of rules that are used to maintain the quality, integrity, and accuracy of the data in a database. They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. Common types of constraints include:
Primary Key Constraint: This ensures that a column (or a combination of two or more columns) has a unique value and none of its values are null. It uniquely identifies each record in a table.
Foreign Key Constraint: This is used to prevent actions that would destroy the links between tables. It maintains the referential integrity of data.
Unique Constraint: This ensures that all values in a column are different. Unlike primary keys, they do not imply a sort order.
Check Constraint: This allows specifying a condition on each row in a table. It provides a way to limit the kind of data that can be stored in a table by ensuring that expressions for a column return a Boolean value.
Not Null Constraint: This enforces a column to not accept null values, ensuring that a value is always present.
PHYSICAL INTEGRATION NOTES
The process of transitioning from a logical to a physical database design involves translating your ERD into actual database tables, complete with specific data types, keys, constraints, and relationships. The transition from logical to physical database design is a key step in the database design process, where careful planning and execution can lead to an efficient and robust database system.
Table Creation Order
When creating tables, especially those with FKs relationships, the order matters. Parent tables, those referenced by a FK in another table, must be created before their child tables. If the parent table doesn’t exist when the child table is created, you’ll encounter an error.
Data Insertion Order
Similarly, when inserting data, the order matters as well. Rows should first be inserted into the parent tables before inserting into the child tables. If a row in a child table references a non-existent row in a parent table, the database will throw a FK constraint violation error.
NULL Values:
Understanding the implications of NULL values is important in physical database design. NULL in a database does not mean zero or empty, but rather that the value is unknown or not applicable. Depending on the constraints set on the column, NULL may or may not be an acceptable value.
Constraints and Data Integrity
Defining appropriate constraints during the physical design phase is crucial for maintaining data integrity. Constraints like PKs, FKs, CK, unique constraints, and checks can ensure that only valid data is inserted into the database.
DATA LANGUAGES WITH SQL
SQL is traditionally divided into four main languages, each with a specific set of commands that serve different purposes within the database management system:
Data Query Language (DQL):
Used for querying the database to retrieve information. The primary command is SELECT, which is used to fetch data from a database.
Data Manipulation Language (DML):
Used for inserting, updating, deleting, and managing data within database objects. The core commands are INSERT, UPDATE, and DELETE.
Data Definition Language (DDL):
Used for defining and modifying the database structure or schema. Includes commands like CREATE, ALTER, and DROP which can be used to create, alter, and delete databases and their objects (like tables, indexes, constraints, etc.).
Data Control Language (DCL):
Used for controlling access to the data in the database. Commands such as GRANT and REVOKE are used to manage database security and user permissions.
THREATS TO DATABASE SECURITY
Cyber Attacks
SQL Injection: A malicious technique where attackers execute unauthorized SQL commands by exploiting vulnerabilities in the input fields of an application, leading to unauthorized access and manipulation of the database.
Phishing: Attackers deceive database users into revealing sensitive information, such as login credentials, which can then be used to gain unauthorized access to the database.
Malware and Ransomware: Malicious software is used to disrupt database operations, steal data, or encrypt database contents for ransom. These attacks can cause significant downtime and data loss.
Insider Threats
Accidental Misuse: Users might unintentionally delete important data or grant excessive privileges due to lack of awareness or training.
Malicious Insiders: Employees or trusted individuals with access to the database might intentionally misuse their privileges for personal gain or to harm the organization.
Physical Threats
Theft: Physical theft of servers or devices containing database information can lead to a direct breach of data.
Natural Disasters: Events like floods, earthquakes, and fires can damage physical infrastructure, resulting in data loss unless proper backups and disaster recovery plans are in place.
Network Attacks
Denial of Service (DoS) and Distributed Denial of Service (DDoS) Attacks: These attacks aim to overwhelm the database server’s resources, making the database unavailable to legitimate users.
Man-in-the-Middle (MitM) Attacks: Attackers intercept and possibly alter the communication between two parties (e.g., between a database client and the server), which can compromise data integrity and confidentiality.
Securing the Physical Database
.Access Controls: Limit physical access to the database servers to authorized personnel only. Use biometric access controls, security badges, and surveillance systems to monitor and control access to server rooms.
. Environmental Controls: Protect hardware against environmental threats, such as excessive heat, humidity, and water damage, with proper cooling systems, fire suppression systems, and waterproof enclosures.
. Hardware Security: Use hardware security modules (HSMs) for encryption key management and secure encrypted data storage devices to enhance data protection at the physical level.
Roles and Access Control
. Principle of Least Privilege (PoLP): Ensure that users and applications have only the minimum levels of access—or permissions—needed to perform their tasks. Regularly review access privileges and adjust them as roles change within the organization.
. Role-Based Access Control (RBAC): Define roles within your organization and assign permissions to these roles rather than to individual users. This makes managing access rights more scalable and understandable.
. Audit and Monitor Access: Implement auditing and monitoring tools to track who accesses the database, when they access it, and what actions they perform. This not only helps in detecting unauthorized access attempts but also in ensuring compliance with regulatory standards.
Data Sanitation (Input Validation)
. Validation: Ensure that all data input into applications is validated for type, length, format, and range. This helps prevent SQL injection and other forms of injection attacks by ensuring that inputs cannot be interpreted as commands or queries.
. Sanitization: Use data sanitization techniques to cleanse input data, removing potentially harmful characters or patterns that could be used in an injection attack. This is particularly important for data that will be used in SQL queries or that comes from untrusted sources.
. Prepared Statements and Parameterized Queries: Use prepared statements with parameterized queries in your application code to separate SQL logic from data values. This approach ensures that user input is treated as data, not as executable code, effectively neutralizing SQL injection attacks.
Implementing Comprehensive Security Measures
. Encryption: Encrypt sensitive data both at rest and in transit to protect it from unauthorized access. Use strong encryption algorithms and keep your encryption keys secure.
. Backup and Recovery Plans: Maintain regular, secure backups of your database and test your recovery procedures to ensure you can quickly restore data in case of loss or corruption.
. Security Policies and Procedures: Develop and enforce comprehensive security policies and procedures that cover all aspects of database security, from physical security to user training and incident response plans.