TB2 Flashcards

1
Q

WHAT IS A RELATIONAL DATABASE?

A

A Relational Database (RDB) is a collective set of multiple data sets organized by tables, records and columns. RDBs establish a well-defined relationship between database tables. Tables communicate and share information, which facilitates data searchability, organization and reporting.
RDBs use Structured Query Language (SQL), which is a standard user application that provides an easy programming interface for database interaction.

Core characteristics:
.Relational databases organizes data into one or more tables.
.Each row (tuple) in the table represents a unique record
.Each column represents a specific attribute or field of that record.
.The relational structure allows us to identify and access data in relation to another piece of data in the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHAT IS A DATABASE MANAGEMENT SYSTEM?

A

A DBMS is a complex piece of software specifically designed to interact with end-users, applications, and the database itself to capture and analyse data. It serves as an interface between the database and end users or application programs, ensuring that data is consistently organized and remains easily accessible.

Core characteristics

It provides a systematic approach to store, manipulate, and access data efficiently, ensuring data integrity and security.
A DBMS allows users to define the structure of the database, specify relationships between data elements, and perform operations such as adding, updating, and deleting data.
It acts as an interface between users and the database, facilitating the storage and retrieval of data while maintaining data consistency and reducing data redundancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

THE ADVANTAGES OF THE RDBs

A

Database management systems (DBMS) have evolved over the years to efficiently store and retrieve data. Relational DBMS offer data consistency and integrity, scalability, and ease of use.

.Data Integrity and ACID Compliance: Relational databases follow ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliability in every transaction.

.Data Structure: The tabular structure makes it easier to understand the schema and relationship between different entities.

.Sophisticated Querying: SQL (Structured Query Language) enables complex queries and operations, offering greater flexibility.

.Mature and Well-Supported: Relational databases have been around for decades, so there’s a wealth of community and vendor support.

.Normalization: Relational databases allow data normalization, which minimizes data redundancy and enhances data integrity.

.Flexibility in Indexing: You can index columns in a relational database to improve data retrieval speed.

.Strong Security: Features like role-based access control are often built-in, offering robust security options.

.Multi-users: Enable data to be easily accessed and shared across multiple applications and users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

THE DISADVANTAGES OF THE RDBs

A

While powerful, relational databases also have their drawbacks. They can have performance issues with very large datasets, complexity in creating and managing, and cost of DBMS software and hardware.

.Scalability: While they can be scaled vertically by adding more power to the individual server, they are generally more challenging to scale horizontally (across multiple servers).

.Complexity: The tabular structure and relationships can become complex as more tables and relationships are added.

.Cost: High-quality RDBMS systems can be expensive in terms of licensing and hardware requirements.

.Performance: For read-heavy and write-heavy operations, particularly with large datasets, NoSQL databases like MongoDB or Cassandra may offer better performance.

.Rigidity: Changing the schema can be difficult and time-consuming, especially for large databases and applications that are already in production.

.Overhead: Features that ensure data integrity and ACID compliance can introduce computational overhead, which may not be suitable for all kinds of applications (e.g., real-time analytics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

COMPONENTS OF A RELATIONAL DATABASE

A

Tables: The primary building blocks, tables store data in a structured format. Each table holds data for a specific object, like “Customers” or “Orders.”

Columns: Each table is made up of columns, (aka fields). These define the type of data that can be stored, such as integers, text, or dates.

Rows or Records: Each row in a table is a record. It is a single instance that contains data for each field in the table.

Primary Key (PK): This is a unique identifier for a record in a table. No two records can have the same primary key, ensuring data integrity and enabling quick data retrieval.

Foreign Key (FK): Used to establish relationships between tables, a foreign key in one table refers to the primary key of another table. This helps maintain referential integrity.

Index: This is a data structure that improves the speed of data retrieval operations. Indexes can be created on one or multiple columns.

Schema: The blueprint of the database, the schema defines the structure, including tables, fields, and the relationships between them.

Constraints: Rules applied to columns to maintain data integrity, such as “NOT NULL,” “UNIQUE,” or “CHECK” constraints.

Data Dictionary: Also known as the system catalogue, this holds metadata like table definitions, field data types, and other properties of the database.

Views: These are virtual tables that provide a layer of security and abstraction, presenting data in a specific way without changing the underlying data.

Triggers and Stored Procedures: These are sets of SQL statements that are stored and can be executed on the database server to enforce business rules or automate tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WHAT IS AN ENTITY?

A

An ENTITY can be defined as a thing or an object that is distinguishable from all other objects.

The entities might include
Customers
Products
Orders
etc.

Entities have relationships with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RELATIONSHIP TYPES (CARDINALITIES)

A

One-to-One (1:1): Each A ↔️ One B
Example: Person ↔️ Passport

One-to-Many (1:M) Each A ↔️ Many B
Example: Teacher ↔️ Students

Many-to-One (M:1): Many A ↔️ One B
Example: Employees ↔️ Department

Many-to-Many (M:M) Many A ↔️ Many B
Example: Students ↔️ Courses

Zero-to-One (0:1): A ↔️ None or One B
Example: Person ↔️ Parking Spot

Zero-to-Many (0:M) A ↔️ None or Many B
Example: Customer ↔️ Orders

One-to-Zero or Many (1:0..*): Each A ↔️ None or Many B
Example: Product ↔️ Reviews

ER Diagram Symbols:

1:1: Single tick on both ends.
1:M: Single tick on one end, crow’s foot on the other.
M:N: Crow’s foot on both ends.
Purpose: Define relationships, ensure data integrity, and minimize redundancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

TYPES OF KEYS

A

Primary Key

Definition: A unique identifier for each record in a table.
Example: student_id in a Students table.

Foreign Key
Definition: A field in one table that uniquely identifies a row of another table, creating a relationship between the two tables.
Example: department_id in an Employees table referencing department_id in a Departments table.

Composite Key
Definition: A key that consists of two or more columns to uniquely identify a record.
Example: (student_id, course_id) in an Enrollments table.

Candidate Key
Definition: A set of one or more columns that can uniquely identify a record. There can be multiple candidate keys in a table.
Example: email or student_id in a Students table.

Super Key
Definition: A set of one or more columns that can uniquely identify a record. A super key can have additional attributes that are not necessary for unique identification.
Example: (student_id, email) in a Students table.

Alternate Key
Definition: A candidate key that is not the primary key.
Example: If student_id is the primary key, email can be an alternate key in a Students table.

Surrogate Key
Definition: An artificial key that is used as a unique identifier for a table. It is typically an auto-incremented number.
Example: id in a Users table, where id is an auto-incremented integer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ATTRIBUTES

A

.Attributes are the properties or characteristics of an entity in a database.
.Each attribute represents a specific piece of data, such as name, age, or address.
.They are represented by columns in a database table.
.An attribute’s value can change over time, reflecting the current state of the entity.
.They are essential for defining the structure of data in a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DATA TYPEs

A

.Data types define the nature of the data that can be stored and manipulated.
.Common data types include INTEGER (INT), FLOAT, VARCHAR, CHAR, BOOLEAN, DATE, TIME, etc.
.Each data type has a specific range of values it can represent.
.Data types are important for ensuring data integrity in a database.
.They can influence the kind of operations that can be performed on the data. For instance, arithmetic operations can be performed on numerical data types (integers), but not on string (characters) or Boolean data types.
.In databases, the chosen data type can also impact the storage space required for each attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A Composite Key

A

A Composite Key in a database is a type key that consists of two or more attributes or columns to uniquely identify a record in a table. It is used when a single attribute is not sufficient to guarantee uniqueness. The combination of these attributes can ensure the uniqueness of each record in the table, fulfilling the role of a primary key when required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

NORMALISATION

A

Database NORMALISATION is a systematic approach to designing databases in a way that reduces data redundancy and improves data integrity. It involves dividing a database into two or more tables and defining relationships between those tables to eliminate duplicate data, simplify queries, and ensure that logical dependencies are maintained correctly. The process is typically conducted in stages, known as “normal forms”, each with a specific set of requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1NF – FIRST NORMAL FORM

A

The FIRST NORMAL FORM (1NF) is a fundamental level of database normalisation that sets the basic rules for an organised database:

Each table cell should contain a single value.
Each record needs to be unique, that is, no duplicate rows/columns are allowed.
Entries in a column (attribute) are of the same kind and follow a specific data type (e.g. do not have CHARACTER type in ROW1 and NUMERIC in ROW2).
Order does not matter* - although are good practice aspects (we will discuss those)

The primary goal of 1NF is to eliminate duplicative data from the same table and create separate columns or rows for each group of related data, establishing a unique identifier (PKs), for each record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2NF – SECOND NORMAL FORM

A

The SECOND NORMAL FORM (2NF) is the next stage of database normalisation, which builds on the rules set by the 1NF. A table is in 2NF if:

It is already in 1NF
All non-key attributes (i.e., columns that are not part of the PK) are fully functionally dependent on the PK. This means they must depend on the entire PK for their existence and uniqueness, not just part of it.

This process helps to reduce redundancy in the database by ensuring that all non-key attributes of the table relate directly to the PK. It’s particularly applicable when the PK is a Composite Key (CK).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3NF – THIRD NORMAL FORM

A

The THIRD NORMAL FORM (3NF) is the next level of database normalisation, extending the rules of 2NF. A table is in 3NF if:

It is already in 2NF
All the attributes (columns) are functionally dependent on solely the PK and not on any other non-key attributes or sets of attributes (i.e., there are no transitive dependencies).

This form ensures that each non-key column of a table is not only dependent on the PK but is also directly dependent on it. It helps to preserve data consistency and eliminate redundancy caused by transitive dependencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DE-NORMALISATION

A

DENORMALISATION is a database optimisation technique in which we add redundant data or group data in order to speed up complex queries. In contrast to normalisation, which aims to minimize redundancy and keep data consistent by breaking down tables, denormalisation does the opposite by combining tables and adding redundant data. The main goal is to reduce the complexity of queries and improve the performance of read-heavy databases at the expense of potential inconsistencies in data and increased storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DATABASE CONSTRAINTS

A

Database constraints are a set of rules that are used to maintain the quality, integrity, and accuracy of the data in a database. They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. Common types of constraints include:

Primary Key Constraint: This ensures that a column (or a combination of two or more columns) has a unique value and none of its values are null. It uniquely identifies each record in a table.

Foreign Key Constraint: This is used to prevent actions that would destroy the links between tables. It maintains the referential integrity of data.

Unique Constraint: This ensures that all values in a column are different. Unlike primary keys, they do not imply a sort order.

Check Constraint: This allows specifying a condition on each row in a table. It provides a way to limit the kind of data that can be stored in a table by ensuring that expressions for a column return a Boolean value.

Not Null Constraint: This enforces a column to not accept null values, ensuring that a value is always present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

PHYSICAL INTEGRATION NOTES

A

The process of transitioning from a logical to a physical database design involves translating your ERD into actual database tables, complete with specific data types, keys, constraints, and relationships. The transition from logical to physical database design is a key step in the database design process, where careful planning and execution can lead to an efficient and robust database system.

Table Creation Order
When creating tables, especially those with FKs relationships, the order matters. Parent tables, those referenced by a FK in another table, must be created before their child tables. If the parent table doesn’t exist when the child table is created, you’ll encounter an error.

Data Insertion Order
Similarly, when inserting data, the order matters as well. Rows should first be inserted into the parent tables before inserting into the child tables. If a row in a child table references a non-existent row in a parent table, the database will throw a FK constraint violation error.

NULL Values:
Understanding the implications of NULL values is important in physical database design. NULL in a database does not mean zero or empty, but rather that the value is unknown or not applicable. Depending on the constraints set on the column, NULL may or may not be an acceptable value.

Constraints and Data Integrity
Defining appropriate constraints during the physical design phase is crucial for maintaining data integrity. Constraints like PKs, FKs, CK, unique constraints, and checks can ensure that only valid data is inserted into the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

DATA LANGUAGES WITH SQL

A

SQL is traditionally divided into four main languages, each with a specific set of commands that serve different purposes within the database management system:

Data Query Language (DQL):
Used for querying the database to retrieve information. The primary command is SELECT, which is used to fetch data from a database.

Data Manipulation Language (DML):
Used for inserting, updating, deleting, and managing data within database objects. The core commands are INSERT, UPDATE, and DELETE.

Data Definition Language (DDL):
Used for defining and modifying the database structure or schema. Includes commands like CREATE, ALTER, and DROP which can be used to create, alter, and delete databases and their objects (like tables, indexes, constraints, etc.).

Data Control Language (DCL):
Used for controlling access to the data in the database. Commands such as GRANT and REVOKE are used to manage database security and user permissions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

THREATS TO DATABASE SECURITY

A

Cyber Attacks

SQL Injection: A malicious technique where attackers execute unauthorized SQL commands by exploiting vulnerabilities in the input fields of an application, leading to unauthorized access and manipulation of the database.

Phishing: Attackers deceive database users into revealing sensitive information, such as login credentials, which can then be used to gain unauthorized access to the database.

Malware and Ransomware: Malicious software is used to disrupt database operations, steal data, or encrypt database contents for ransom. These attacks can cause significant downtime and data loss.

Insider Threats

Accidental Misuse: Users might unintentionally delete important data or grant excessive privileges due to lack of awareness or training.

Malicious Insiders: Employees or trusted individuals with access to the database might intentionally misuse their privileges for personal gain or to harm the organization.

Physical Threats

Theft: Physical theft of servers or devices containing database information can lead to a direct breach of data.

Natural Disasters: Events like floods, earthquakes, and fires can damage physical infrastructure, resulting in data loss unless proper backups and disaster recovery plans are in place.

Network Attacks

Denial of Service (DoS) and Distributed Denial of Service (DDoS) Attacks: These attacks aim to overwhelm the database server’s resources, making the database unavailable to legitimate users.

Man-in-the-Middle (MitM) Attacks: Attackers intercept and possibly alter the communication between two parties (e.g., between a database client and the server), which can compromise data integrity and confidentiality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Securing the Physical Database

A

.Access Controls: Limit physical access to the database servers to authorized personnel only. Use biometric access controls, security badges, and surveillance systems to monitor and control access to server rooms.

. Environmental Controls: Protect hardware against environmental threats, such as excessive heat, humidity, and water damage, with proper cooling systems, fire suppression systems, and waterproof enclosures.

. Hardware Security: Use hardware security modules (HSMs) for encryption key management and secure encrypted data storage devices to enhance data protection at the physical level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Roles and Access Control

A

. Principle of Least Privilege (PoLP): Ensure that users and applications have only the minimum levels of access—or permissions—needed to perform their tasks. Regularly review access privileges and adjust them as roles change within the organization.

. Role-Based Access Control (RBAC): Define roles within your organization and assign permissions to these roles rather than to individual users. This makes managing access rights more scalable and understandable.

. Audit and Monitor Access: Implement auditing and monitoring tools to track who accesses the database, when they access it, and what actions they perform. This not only helps in detecting unauthorized access attempts but also in ensuring compliance with regulatory standards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data Sanitation (Input Validation)

A

. Validation: Ensure that all data input into applications is validated for type, length, format, and range. This helps prevent SQL injection and other forms of injection attacks by ensuring that inputs cannot be interpreted as commands or queries.

. Sanitization: Use data sanitization techniques to cleanse input data, removing potentially harmful characters or patterns that could be used in an injection attack. This is particularly important for data that will be used in SQL queries or that comes from untrusted sources.

. Prepared Statements and Parameterized Queries: Use prepared statements with parameterized queries in your application code to separate SQL logic from data values. This approach ensures that user input is treated as data, not as executable code, effectively neutralizing SQL injection attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Implementing Comprehensive Security Measures

A

. Encryption: Encrypt sensitive data both at rest and in transit to protect it from unauthorized access. Use strong encryption algorithms and keep your encryption keys secure.

. Backup and Recovery Plans: Maintain regular, secure backups of your database and test your recovery procedures to ensure you can quickly restore data in case of loss or corruption.

. Security Policies and Procedures: Develop and enforce comprehensive security policies and procedures that cover all aspects of database security, from physical security to user training and incident response plans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Encryption

A

Encryption is a security method where information is encoded in such a way that only authorized parties can access it. It transforms readable data, or plaintext, into an unreadable format, known as ciphertext, using an encryption algorithm and a key. This ensures that even if data breaches occur, the information remains unreadable and secure.

Importance of Encryption:

.Data Security: Protects sensitive data such as personal information, financial records, and confidential corporate data from breaches and unauthorized access.

.Privacy Compliance: Many industries are subject to regulations requiring the protection of sensitive information, such as GDPR, HIPAA, etc., which mandate the use of encryption.

.Trust: Maintains the trust of customers and stakeholders by ensuring that their data is handled securely and responsibly.

.Data Integrity: Ensures that data is not altered or tampered with during storage or transmission, maintaining its accuracy and reliability.

26
Q

Types of Data That Require Encryption

A

.Personal Identifiable Information (PII): Names, addresses, social security numbers, etc.

.Financial Information: Credit card numbers, bank account details, transaction records.

.Confidential Business Information: Trade secrets, proprietary technology, strategic plans.

27
Q

Types of Data to Encrypt in Databases

A

Databases use various encryption techniques to protect data at different levels, from individual cells to entire databases. These techniques can be broadly categorized into symmetric and asymmetric encryption, each with its advantages and use cases.

Symmetric Encryption

Utilizes a single key for both encryption and decryption. This method is efficient and typically faster, making it suitable for encrypting large volumes of data.
Algorithms - Advanced Encryption Standard (AES), Triple Data Encryption Standard (3DES), and Blowfish.
Ideal for encrypting data at rest, such as entire database files or backups, where encryption and decryption speed is crucial.

Asymmetric Encryption

Employs a pair of keys – a public key for encryption and a private key for decryption. This method is more secure but also more computationally intensive.
Algorithms - RSA (Rivest-Shamir-Adleman), Elliptic Curve Cryptography (ECC), and Digital Signature Algorithm (DSA).
Best suited for encrypting data in transit or for establishing secure connections for remote database access.

28
Q

Encryption at Rest vs. Encryption in Transit

A

Rest: Protects data stored within the database or on disk. Techniques include Transparent Data Encryption (TDE) and file-system level encryption. It’s crucial for preventing data breaches resulting from physical theft or unauthorized access to storage media.

Transit: Secures data as it moves between the database and applications or between servers. Implemented through protocols like TLS (Transport Layer Security) and SSL (Secure Sockets Layer), it safeguards against interception and eavesdropping.

29
Q

Application-Level vs. Database-Level Encryption

A

Application-Level Encryption: The application encrypts data before sending it to the database. This approach provides fine-grained control over what data is encrypted and allows for application-specific encryption schemes.

Database-Level Encryption: The database system itself manages encryption, offering a more straightforward implementation that requires less modification to existing applications. However, it may provide less flexibility in terms of which data is encrypted.

30
Q

Data retrieval

A

Data retrieval in the context of encrypted databases involves fetching the requested data and converting it from its encrypted form (ciphertext) back into a readable format (plaintext). This process is crucial for maintaining the confidentiality and integrity of sensitive information while ensuring it remains accessible to authorized users.

The Retrieval Process

Request Authentication: Verifies the identity of the user or system requesting data, ensuring that only authorized parties can initiate data retrieval.

Decryption: Once access is granted, the database decrypts the requested data using the appropriate decryption key. This step requires careful management of encryption keys to ensure they are accessible to legitimate users while being protected from unauthorized access.

Data Presentation: The decrypted data is then presented to the user or application in a readable format. This step often involves additional security measures, such as secure transmission protocols and session management, to protect data as it’s delivered to the end user.

31
Q

Challenges in Data Retrieval and Decryption

A

Performance Overhead: Encryption and decryption processes can introduce latency, affecting the performance of database queries and data retrieval operations.

Key Management Complexity: Managing the lifecycle of encryption keys, including generation, storage, rotation, and revocation, poses significant challenges. Mismanagement can lead to data loss or breaches.

Access Control and Authentication: Implementing robust access control mechanisms is essential to prevent unauthorized data access. This includes managing permissions and roles within the database management system.

32
Q

Best Practices for Secure Retrieval

A

Implement Role-Based Access Control (RBAC): Define roles and permissions clearly to ensure users have access only to the data they are authorized to view.

Use Secure Transmission Protocols: When transmitting data, especially over public networks, use protocols like TLS (Transport Layer Security) to protect the data in transit.

Regular Audits and Monitoring: Conduct regular security audits and monitor access logs to detect and respond to unauthorized access attempts promptly.

Encryption Key Management: Utilize dedicated key management solutions to automate key rotation, securely store keys, and ensure they are only accessible to authorized applications and users.

Data Masking: For extra security, especially in development and testing environments, use data masking techniques to obscure sensitive information, ensuring that even if data is accessed, it cannot be misused.

33
Q

Emerging Technologies and Approaches

A

Homomorphic Encryption: Allows computations on encrypted data, providing results without ever decrypting the data. This advanced technique is promising for secure data processing and analysis.

Blockchain for Data Integrity: Leveraging blockchain technology to ensure the integrity and immutability of transaction logs and sensitive data records.

34
Q

PostgreSQL ENCRYPTION

A

The pgcrypto extension enables PostgreSQL with cryptographic functionality, facilitating encryption/decryption and secure storage of data directly within the database.

It supports both symmetric and limited asymmetric encryption methods, allowing for the protection of sensitive information and secure password storage through hashing.

The pgcrypto extension in PostgreSQL supports symmetric encryption using PGP_SYM and AES, allowing secure data storage. Remember: Symmetric encryption means the same key is used to encrypt and decrypt data.

PGP_SYM is versatile, ideal for encrypting text data with a passphrase, making it user-friendly for scenarios where data needs to be shared securely between parties who have the passphrase.

AES is known for its speed and robust security, suitable for encrypting large volumes of data efficiently. It’s recommended for situations demanding high performance and strong security, such as storing sensitive user information or financial records.

35
Q

HASHING VS ENCRYPTION

A

In secure password storage practices, passwords are hashed, not encrypted. The distinction is crucial: hashing is a one-way process, meaning once a password is hashed, it cannot be reversed or decrypted back to its original plaintext form. This is why it’s impossible to “decrypt” and view the original password from a hash stored in the database.

Hashing is a one-way process used for verifying the integrity of data. The same input always produces the same output, but it’s computationally infeasible to reverse the process and retrieve the original input from the hash output.
Encryption is a two-way process that allows data to be made unreadable via encryption and then returned to its original readable form via decryption, using a specific key.

36
Q

Why You Can’t Decrypt Hashed Passwords

A

The purpose of hashing passwords before storing them is to protect user credentials. Even if a database is compromised, the attackers cannot retrieve the actual passwords, only their hashes.

Secure password hashing algorithms, especially those designed for password storage like bcrypt, scrypt, or Argon2, are intentionally designed to make this reversal computationally impractical.

37
Q

Crypt Function

A

A PostgreSQL function used to hash passwords.

Utilizes a cryptographic hash function to convert plaintext passwords into a secure, fixed-size hash.

Ensures that stored passwords are not kept in plain text, enhancing security.

38
Q

Algorithms for hashing passwords

A

BF (Blowfish): This is the algorithm used by bcrypt, which is well-regarded for its security due to its adaptive cost factor. It allows you to scale the algorithm’s complexity and resistance to brute-force attacks as hardware capabilities improve.

MD5: An older hashing algorithm that is much faster but significantly less secure than Blowfish. It’s generally not recommended for new systems due to vulnerabilities to collision attacks and its susceptibility to fast brute-force attacks.

XDES (Extended DES): An extension of the traditional DES (Data Encryption Standard) algorithm, offering better security through a configurable number of encryption rounds. Like Blowfish, it can be set to be computationally intensive, though it’s generally less used than bcrypt.

DES: The original Data Encryption Standard algorithm. It’s considered obsolete for most purposes due to its short key length, which makes it vulnerable to brute-force attacks.

39
Q

Salt Function

A

A salt is a random sequence of characters added to the input of a hash function along with the password.
The same password with different salts will result in different hashes.
Significantly improves hash security by preventing pre-computation attacks.
Typically used with crypt via the gen_salt function, which supports multiple hashing algorithms.

Purpose

Uniqueness: By adding a salt to each password before it is hashed, even identical passwords will produce unique hash values, thus preventing attackers from using pre-computed hash tables (rainbow tables) to crack the passwords.

Security Enhancement: Salts increase the complexity and uniqueness of hashed passwords, making them much harder to crack. This is particularly important in a database breach scenario where attackers gain access to hashed passwords.

40
Q

What is Database Performance?

A

Database performance refers to the effectiveness of database systems in managing data operations, measured by the system’s response time, throughput, and resource utilization. High performance means that the database can handle queries and transactions quickly and efficiently, with minimal delays and optimal use of hardware resources.

Why tunning is important?

User Experience: The speed at which a database processes and returns information can significantly affect the user’s interaction with an application. Faster responses improve user satisfaction and engagement.

Resource Optimization: Efficient database operations consume less CPU, memory, and disk I/O, which not only improves the current system’s responsiveness but also scales better with increased load, delaying or eliminating the need for costly hardware upgrades.

Consistency and Reliability: Well-tuned databases handle peak loads effectively, maintain consistent performance levels under varying loads, and ensure data integrity and security.

41
Q

BAD DATABASE DESIGN

A

Redundancy: Poor design often leads to unnecessary duplication of data across the database. This not only wastes storage space but also complicates updates, as the same data may need to be updated in multiple places. Over time, redundancy can lead to significant inefficiencies and increased likelihood of data inconsistencies.

Update Anomalies: A database that hasn’t been properly normalized is prone to update anomalies. This means that changes to data in one part of the database can inadvertently lead to inconsistencies elsewhere. For example, if duplicate data exists in multiple tables, updating it in one place but not the others can lead to discrepancies, making the database unreliable.

Inefficiency: Inefficient data organization can slow down query performance, especially as the volume of data grows. For instance, without proper indexing or separation of frequently accessed data from less frequently used information, queries can become slower due to the need to scan large amounts of irrelevant data.

Scalability Issues: Databases designed without considering future growth can encounter scalability issues. This may manifest as performance degradation under increased load, difficulty in implementing necessary schema changes, or challenges in optimizing queries to meet evolving business requirements.

Loss of Flexibility: A rigidly structured database can significantly hinder the implementation of new features or adjustments to the business logic. When the database schema is too closely tied to the current application logic, any change in business requirements can require extensive modifications to the database, leading to higher development costs and potential downtime.

42
Q

QUERY OPTIMIZATION

A

Query optimization is a crucial aspect of database management aimed at reducing the time and resources required to execute SQL queries. It involves rewriting queries, choosing the most efficient execution paths, and employing database features like indexes and partitioning to speed up data retrieval. Effective query optimization can significantly improve application performance and responsiveness, especially in databases with large volumes of data.

43
Q

Principles of Query Optimization

A

Execution Plan: Databases use execution plans to determine how to carry out a query. Understanding these plans, which can be viewed using commands like EXPLAIN in PostgreSQL, is fundamental to identifying bottlenecks and optimizing queries.

Index Usage: Proper use of indexes is one of the most effective ways to improve query performance. However, indexes should be used strategically, as unnecessary indexes can degrade write performance and consume additional storage.

Query Rewriting: Often, the way a query is written can impact performance. Rewriting queries to be more efficient, such as minimizing subqueries, using joins appropriately, and filtering data as early as possible in the query, can lead to significant improvements.

Statistics and Cardinality Estimates: Modern DBMSs collect statistics about table sizes and data distributions, which are used to estimate the “cost” of different query plans. Keeping these statistics up to date is crucial for the optimizer to make accurate decisions.

Leverage Database Features: Use database-specific features like partitioning to improve query performance on large tables, and consider using materialized views to cache complex queries.

44
Q

MONITORING

A

MONITORING AND CONTINUOUS IMPROVEMENT

Continuous monitoring helps identify performance bottlenecks, inefficient queries, and potential data integrity issues. By tracking performance metrics and patterns over time, database administrators and developers can proactively address issues before they impact the user experience or lead to more significant problems.

Key Areas for Monitoring

Query Performance: Identify slow-running queries that may need optimization.
Resource Utilization: Monitor CPU, memory, and disk I/O to ensure the database server has sufficient resources.
Index Usage and Efficiency: Ensure indexes are being used effectively and identify opportunities for additional indexing.
Errors and Warnings: Track error logs for any unusual activity or recurrent issues that need attention.

45
Q

CONTINUOUS IMPROVEMENT

A

Continuous Improvement Cycle

Assess: Regularly review performance metrics and logs.

Plan: Identify issues and prioritize fixes based on impact.

Implement: Apply optimizations, such as query rewriting, index adjustments, or configuration changes.

Review: Assess the impact of changes and document lessons learned.

46
Q

Complexity in Design and Management

A

Design Challenges: Creating an efficient database requires careful planning and understanding of the data’s nature. A poorly designed database can lead to redundancy, inconsistency, and inefficient data retrieval.

Management Overhead: Databases need ongoing maintenance to ensure they run smoothly. This includes tasks like updating systems, backing up data, and optimizing performance. Such management tasks require skilled administrators and can become time-consuming and complex, especially for large databases.

47
Q

Cost Considerations

A

Initial Setup Costs: The cost of setting up a database system can be high, especially for large-scale operations. This includes hardware, software licenses, and the cost of hiring skilled personnel to design and implement the database.

Ongoing Operational Costs: Beyond the initial setup, databases incur ongoing operational costs, including maintenance, security measures, and updates. For complex and large-scale systems, these costs can be significant.

48
Q

Performance Issues and Scalability Challenges

A

Handling Large Data Volumes: As the amount of data grows, databases can experience slowdowns if not properly optimized or scaled. Performance tuning and scaling solutions are necessary but can add to the complexity and cost.

Scalability Limitations: While databases are designed to scale, doing so effectively requires careful planning and additional resources. Scaling challenges can arise from hardware limitations, software architecture, or the database model used.

49
Q

Security Vulnerabilities

A

Data Breaches and Attacks: Despite robust security measures, databases are constant targets for cyberattacks, leading to potential data breaches. The consequences of such breaches can be severe, including loss of sensitive information, financial loss, and damage to reputation.

Complex Security Management: Ensuring a database is secure involves managing access controls, encrypting data, and monitoring for suspicious activity. This complexity can be overwhelming, especially for organizations without dedicated security experts.

50
Q

Common Security Threats

A

SQL Injection: A technique used by attackers to execute malicious SQL commands by exploiting vulnerabilities in the database layer. This can lead to unauthorized access to sensitive information.

Data Breaches: Unauthorized access to the database can result in sensitive data being stolen, including personal information, financial records, and intellectual property.

Insider Threats: Sometimes, the threat comes from within an organization. Employees with access to databases might misuse their privileges, intentionally or accidentally exposing data.

51
Q

Implementing Robust Security Measures

A

Encryption: Encrypting data stored in databases is fundamental to protecting it from unauthorized access. Both data at rest and in transit should be encrypted.

Access Controls: Implement strict access control policies to ensure that only authorized personnel can access sensitive data. Use role-based access control to minimize the risk of insider threats.

Regular Audits and Monitoring: Conduct regular security audits to check for vulnerabilities and monitor database activity to detect any suspicious behaviour promptly.

52
Q

Best Practices for Database Security

A

Keep Software Up to Date: Regularly update database management software to protect against known vulnerabilities.

Use Strong Authentication Mechanisms: Implement strong password policies and consider multi-factor authentication to enhance security.

Backup and Recovery Plans: Maintain regular backups of data and have a recovery plan in place to deal with data loss or corruption incidents.

53
Q

TYPES OF BACKUPS

A

Full Backup: Copies all data from the database. It provides the foundation for other types of backups but requires more storage space and time to complete.
Example: Performing a full backup weekly to ensure a complete copy of all data at that point in time.

Incremental Backup: Only backs up the data that has changed since the last backup (either full or incremental). It’s faster and requires less storage than a full backup but depends on the previous backups for a complete restore.
Example: Daily incremental backups to capture the changes made each day, reducing the time and storage space needed.

Differential Backup: Captures the changes made since the last full backup. Unlike incremental backups, each differential backup grows in size, as it accumulates all changes since the last full backup.
Example: Using differential backups mid-week to quickly recover data without needing to process all daily increments.

54
Q

BACKUP STRATEGIES

A

Regular Scheduling: Implementing a regular schedule for full, incremental, and differential backups to balance between protection and resources.

Storage Solutions: Utilizing various storage solutions, including on-site servers and cloud storage, to protect against site-specific disasters.

Testing and Verification: Regularly testing backups to ensure data can be effectively restored and verifying backup integrity to confirm data is not corrupted.

55
Q

DATABSE RECOVERY PROCESS

A

Assessment: Quickly assess the extent of the data loss or corruption to determine the appropriate recovery method.

Choosing a Recovery Point: Decide on the most appropriate point to restore the data from, considering the data loss event and the state of available backups.

Preparation: Ensure the recovery environment is ready, which may include setting up hardware, configuring software, or preparing the network for data restoration.

Restoration: Execute the recovery process, which may involve restoring from a full backup, applying incremental backups, or using point-in-time recovery techniques.

Validation: After restoration, validate the integrity and completeness of the recovered data to ensure it meets operational requirements.

Review: Conduct a post-recovery review to understand the cause of the data loss, evaluate the effectiveness of the recovery process, and identify improvements for future recovery plans.

56
Q

PostgreSQL features

A

ACID Compliance: Ensures reliable transaction processing and maintains data integrity in complex transactions.

Extensive Data Type Support: Handles a wide array of data types including geometric, custom types, and JSON, facilitating flexible data storage and manipulation.

Advanced Replication: Built-in support for streaming replication and logical replication, allowing for high availability and flexible replication strategies.

Point-in-Time Recovery (PITR): Supports continuous archiving of transaction logs, enabling precise restoration of data to any point in time.

Robust Security Features: Offers strong security features like role-based access control, SSL-encrypted connections, and row-level security to protect sensitive data.

Flexible Backup Options: PostgreSQL offers a range of backup options including SQL dump, file system level backup, and continuous archiving, catering to different recovery point objectives (RPOs).

Efficient Disaster Recovery: The inclusion of features like WAL (Write-Ahead Logging - a standard method used in database systems to ensure data integrity) archiving and streaming replication facilitates efficient disaster recovery strategies, minimizing downtime.

Tooling and Extensions: A wealth of third-party tools and extensions like pgBackRest, Barman, and WAL-E enhance PostgreSQL’s native backup and recovery capabilities, providing automation, efficiency improvements, and added flexibility.

Performance and Scalability: Even during backup and recovery processes, PostgreSQL’s architecture and performance optimizations ensure minimal impact on operational efficiency.

57
Q

Planning and Strategy

A

Regular Backup Schedules: Automate backups to occur during off-peak hours to minimize performance impact. Decide on the frequency of full, incremental, and differential backups based on data change rate and recovery objectives.

Backup Types and Rotation: Use a combination of full, differential, and WAL backups. Implement a retention policy that aligns with business requirements and storage capacity.

Disaster Recovery Plan: Develop a comprehensive disaster recovery plan that includes backup, recovery procedures, roles, and responsibilities.

58
Q

Automation

A

Backup Automation: Utilize tools such as cron jobs, pgAdmin, or custom scripts to automate the backup process. Ensure notifications are in place for any failures or issues.

Automated Testing: Schedule regular automated recovery drills to validate backup integrity and the restoration process.

Monitoring and Alerts:
Monitoring Systems: Implement monitoring systems to track backup processes. Monitor disk usage, error logs, and the completion status of scheduled backups.

Alerting Mechanisms: Set up alerts for backup failures, storage capacity thresholds, and other critical events that might require immediate attention.

59
Q

Documentation and Training

A

Documentation: Maintain thorough documentation of the backup and recovery procedures, including configurations and step-by-step guides.

Training: Regularly train staff involved in backup and recovery processes to ensure they are familiar with procedures and best practices.

60
Q

Testing and Verification

A

Test Restores: Periodically perform test restores from backup to ensure that data can be recovered successfully and within the required time frame.

Data Integrity Checks: Perform checksums and data integrity checks post-recovery to ensure that the data is consistent and intact.

61
Q

Storage and Management

A

Offsite and Onsite Storage: Store backups both onsite for quick recovery and offsite to protect against local disasters.

Backup Storage Management: Monitor and manage storage space to ensure that backups do not fail due to insufficient disk space.