TB2 878u7 Flashcards
Normalisation
Normalisation is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.
1NF: A table is in 1NF if it contains no repeating groups or arrays, which means each cell contains a single value. Each record needs to be unique.
Key Points: Elimination of duplicate columns from the same table, creation of separate tables for each group of related data, identification of each record with a unique attribute or set of attributes known as the PKs.
2NF: A table is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the PK. This means each non-key attribute must be a fact about the key, the whole key, and nothing but the key.
Key Points: Removal of subsets of data that apply to multiple rows of a table and place them in separate tables, creation of relationships between these new tables and their predecessors through the use of FKs.
3NF: A table is in 3NF if it is in 2NF and there are no transitive dependencies of non-key attributes on the PK. That is, every non-key attribute is non-transitively dependent on the PK.
Key Points: Elimination of fields that do not depend on the PK; that is, any field that is functionally dependent on another field which is not a PK itself should be moved to a separate table.
TYPES OF KEYS
In the relational model, keys are important because they are used to ensure that each row in a table is uniquely identifiable. They are also used to establish relationships among tables and to ensure the integrity of the data. A key consists of one or more attributes that determine other attributes.
Foreign Key
Primary Key
Composite Key
Alternate Key
A key that contain all the properties needed to become a Candidate Key are known as Alternate Keys. An Alternate Key can be a set of a single attribute or multiple attributes.
It can be NULL as well but they must be UNIQUE. (e.g. email or phone number)
JOINs
In SQL, a JOIN is an operation that combines rows from two or more tables based on a related column between them (The PKs/FKs). This allows data from multiple tables to be integrated and queried as if it were in a single table.
INNER JOIN: Also known as a simple JOIN, it returns rows when there is a match in both tables.
LEFT JOIN: (or LEFT OUTER JOIN) - Returns all rows from the LEFT TABLE and the matched rows from the RIGHT TABLE. If no match is found, the result is NULL on the right side.
Right Join (or RIGHT OUTER JOIN): Returns all rows from the RIGHT TABLE and the matched rows from the LEFT TABLE. If no match is found, the result is NULL on the left side.
FULL JOIN (or FULL OUTER JOIN): Returns rows when there is a match in one of the tables. If there is no match, the result is NULL on either side.
SELF JOIN: A self join is a regular join, but the table is joined with itself.
CROSS JOIN: A cross join returns the CARTESIAN product of rows from both tables.
LEFT JOINs
A LEFT JOIN is a type of join that combines rows from two or more tables based on a related column between them. The key feature of a LEFT JOIN is that it includes all rows from the left table plus any matching rows from the right table.
If there are no matches in the right table, the result still includes the rows from the left table, with NULL in the columns from the right table.
RIGHT JOINs
A RIGHT JOIN is a type of join that combines rows from two or more tables based on a related column between them. The key feature of a RIGHT JOIN is that it includes all rows from the right table , plus any matching rows from the left table
If there are no matches in the left table, the result still includes the rows from the left table, with NULL in the columns from the right table.
FULL JOINs
A FULL JOIN is a type of join that combines rows from two or more tables based on a related column between them. The key feature of a FULL JOIN is that it includes all rows from both the left and the right tables. When there is a match in the related column between the two tables, it combines the matched rows into a single row.
If there is no match, the result will still show every row from both tables, but with NULL in the columns from the table that lacks a corresponding match.
Thus, a FULL JOIN provides a complete set of records from both tables, with matching records from both sides where available. For non-matching records from either table, the query fills in with NULLs to indicate the absence of a match. This type of join is useful when you need to understand the relationship between two tables, including the presence of unmatched records on either side.
VIEWs
A VIEW in databases is a virtual table created by a query that selects data from one or more underlying tables. It does not store data itself but presents data from these tables in a format specified by the query. Views can simplify complex queries, provide a specific data perspective or subset, and enhance data security by restricting access to certain data.
Views adv
Advantages of Using VIEWs:
Data Abstraction: Views can simplify the complexity of data by providing a specific, focused perspective of the data, which is especially useful when the underlying data structures are complex with multiple tables and joins.
Security: Views can act as a layer of security to restrict user access to specific rows and columns of data, ensuring users only see the data they are authorized to access.
Query Simplification: Views allow users to save complex queries as virtual tables, so they can be reused without needing to rewrite the query. This can make frequent complex queries much more manageable.
Views Disadv
Disadvantages of Using VIEWs:
Performance: Views can sometimes lead to performance issues, especially when they are built on top of other views or involve complex calculations, as the database must execute the underlying queries each time the view is accessed.
Update Restrictions: Some views are not updatable or insertable, particularly those that contain aggregations, distinct clauses, or joins across multiple tables, which can limit their use in dynamic data environments.
Maintenance: If the underlying table structures change, the views may need to be updated or recreated. This can lead to additional maintenance overhead, especially if there are many views or if they are used in multiple applications.
INDEX-es
An INDEX in a database is a data structure that improves the speed of data retrieval operations on a table. It works similarly to an index in a book: just as a book index allows you to quickly find specific information without reading the entire book, a database index allows the database engine to find and retrieve specific rows much faster than it could by scanning the entire table.
Indexes are typically used on columns that are frequently searched or used as join keys. However, while they speed up data retrieval, they can slow down data insertion, deletion, and updating, as the index must be updated whenever the data it indexes is altered.
Advantages of INDEX-es:
Improved Query Performance: Indexes provide a way to quickly locate and retrieve specific rows from a table, significantly speeding up the execution of SELECT statements that involve filtered or sorted data.
Faster Joins: When multiple tables are joined in a query, indexes on the join columns help optimize the join operation, reducing the time needed to combine data from different tables.
Efficient Data Retrieval: Indexes allow for efficient data retrieval without the need to scan the entire table, making them particularly useful for large datasets and complex queries.
Facilitates Unique Constraints: INDEX-es can enforce the uniqueness of values in one or more columns, ensuring that duplicate data is not allowed in those columns. This is especially useful for maintaining data integrity and preventing data duplication in database tables. (e.g. CREATE UNIQUE INDEX index_name ON table_name (column1, column2);)
Disadvantages of INDEX-es
Increased Storage Overhead: Indexes require additional storage space to store the index data structure. This can be a concern for large tables with multiple indexes, as it can consume a significant amount of disk space.
Slower Data Modification: While indexes improve data retrieval speed, they can slow down INSERT, UPDATE, and DELETE operations, as the index structures need to be maintained whenever the underlying data changes.
Maintenance Overhead: Indexes need to be maintained, which means that as data is modified, indexes must be updated to reflect these changes. This maintenance overhead can impact system performance and requires careful management.
Index Selection: Choosing the right columns to index is crucial. Creating too many indexes or indexing the wrong columns can lead to unnecessary overhead and may not necessarily improve query performance. Proper index design and analysis are essential to benefit from indexes effectively.
SUBQUERIES (NESTED QUERIES)
A subquery, also known as a nested query, is a SQL query that is embedded within another SQL query. It is used to retrieve data from one or more tables and is enclosed within parentheses and it is used to perform operations that require multiple steps of logic in a database. The result of the subquery can be used as a part of the main query to filter, manipulate, or make decisions about the data being retrieved.
Advantages of Nested Queries:
Complexity Management: They help in breaking down complex problems into simpler, manageable parts.
Reusability: Subqueries can be reused in different parts of the outer query to avoid redundancy.
Logical Grouping: They allow for logical grouping of data retrieval in a single query, making it easier to understand the relationship between different data sets.
Performance Issues: Nested queries can lead to performance degradation, especially if not optimized properly, as they may require multiple scans of the same table.
Readability: Complex nested queries can become difficult to read and maintain, especially for those not familiar with the database schema.
Limited Flexibility: In some cases, subqueries are less flexible than JOIN operations, particularly when working with multiple tables.
SUBQUERIES vs JOINS
A subquery is a query nested inside another query, used to perform operations that require multiple steps of logic.
.A subquery is ideal for operations that require a single value returned, like in WHERE clauses or select lists.
.Useful for conditions that depend on the result of another query.
.Better for readability when performing complex queries that require a step-by-step approach.
Simplifies the SQL logic by breaking down complex conditions into manageable parts.
.Can be slower and less efficient than JOINs, especially if the subquery is executed multiple times (e.g., in a correlated subquery).
.Optimization by the SQL engine varies; performance can sometimes be improved by rewriting as JOINs.
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
.Best suited for retrieving data from multiple tables where the tables are directly related.
.Efficient for operations that require comparing large datasets.
.Generally faster for straightforward data retrieval from multiple tables.
.Allows for more dynamic and complex interactions between multiple tables.
.Typically more efficient for operations involving multiple tables.
.Better optimized by SQL engines, but can become complex and harder to read with multiple JOIN operations.