SQL Flashcards
JOIN
The JOIN clause is used to retrieve data from multiple tables based on a specified condition.
INNER JOIN
Retrieves records that have matching values in both tables.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
LEFT JOIN (or LEFT OUTER JOIN)
Retrieves all records from the left table and the matched records from the right table. If there is no match, the result is NULL on the side of the right table.
Syntax:
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
RIGHT JOIN (or RIGHT OUTER JOIN)
Retrieves all records from the right table and the matched records from the left table. If there is no match, the result is NULL on the side of the left table.
Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;
FULL JOIN (or FULL OUTER JOIN)
Retrieves records when there is a match in either left or right table. If there is no match, the result is NULL on the side where there is no match.
Syntax:
SELECT columns
FROM table1
FULL JOIN table2
ON table1.column = table2.column;
CROSS JOIN
Produces a Cartesian product of the two tables, meaning every combination of rows from both tables.
Syntax:
SELECT columns
FROM table1
CROSS JOIN table2;
ON:
The ON clause is used to specify the condition for the JOIN. It defines how rows from one table relate to rows in another table. The condition typically involves comparing columns from each table.
Syntax:
SELECT columns
FROM table1
JOIN table2
ON table1.column = table2.column;
Example:
SELECT students.ID, students.name, courses.title
FROM students
JOIN enrollments ON students.ID = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
Physical data independence
Physical data independence is the ability of a database system to separate the conceptual schema (the logical structure of data) from the physical schema (the physical storage of data).
Importance of Physical Data Independence:
Flexibility: Database administrators can modify the underlying storage without disrupting user queries or requiring changes to applications.
Optimization means making a database work faster by improving how data is stored or accessed, without changing the basic structure of the data (the logical schema). This allows the database to run smoothly and more efficiently, without affecting how users or applications interact with it..
Cost-Effectiveness: By separating physical and logical layers, system upgrades can be performed without costly changes to the overall system.
Scalability: As data grows, the database system can alter storage strategies to handle larger volumes without affecting users.
logical schema
a data model that describes the structure and relationships of data in a database, and how it’s stored
relation instance
Represents the actual data (the contents or rows) currently stored in the table. A relation instance can change over time as rows are added, deleted, or updated.
Relation Schema:
Describes the structure or blueprint of the table (i.e., the table name, column names, and types). It defines what the table looks like, but doesn’t hold the actual data.
domain
In the context of databases, a domain refers to the set of allowable values that an attribute (or column) can take. It defines the data type and possibly any constraints or rules for what values are valid for a particular column.
The domain specifies the type of data that can be stored in a column, such as INTEGER, VARCHAR, DATE
data integrity
guarantees that the data remains trustworthy and free from corruption or errors.
foreign key
referencing table
primary key
referenced table
An atomic domain
refers to a domain (or set of allowable values for a column) where the values are indivisible or atomic, meaning each value in the column is a single, indivisible unit of data. An atomic domain means that each value in a column is indivisible, and no column can contain multiple values
Graph Database vs Relational Database
Graph Database and Relational Database are two different types of database models designed for different kinds of data storage and querying needs. Here’s a comparison between the two:
- Data Structure
Graph Database:
Data is stored in the form of nodes (represent entities like people, products, or locations) and edges (represent relationships between entities).
This is a flexible, highly connected data model where relationships are first-class citizens.
Common graph databases: Neo4j, Amazon Neptune, ArangoDB.
Relational Database:
Data is organized into tables (also called relations), with rows representing records and columns representing attributes.
Relationships between data are represented through foreign keys and joins.
Common relational databases: MySQL, PostgreSQL, Oracle.
- Relationships
Graph Database:
Relationships are explicitly stored as edges, making them easy to query. Traversing relationships (e.g., finding connected nodes) is very efficient.
Ideal for scenarios with complex, many-to-many relationships, like social networks, recommendation engines, and fraud detection.
Relational Database:
Relationships are implicit and must be created using foreign keys. Querying relationships often involves joins between tables, which can become expensive in terms of performance when dealing with large, complex datasets.
- Query Language
Graph Database:
Uses graph-specific query languages, such as Cypher (in Neo4j), Gremlin, or SPARQL.
These query languages are designed for traversing relationships, making it simple to perform recursive queries or find the shortest path between nodes.
Relational Database:
Uses SQL (Structured Query Language), a declarative language that is highly standardized.
SQL is powerful for handling structured data and is widely used in relational databases for querying, updating, and managing data.
Normalization
Normalization is a systematic process in database design used to organize data in a way that reduces redundancy and improves data integrity. It involves dividing a database into two or more related tables and defining relationships between them to ensure that data is stored efficiently. The primary goal of normalization is to minimize duplication of data, avoid update anomalies, and ensure data consistency.
Why NULL?
Data Completeness: Null values let the database store incomplete information without causing problems. Without them, you might need to put placeholder data like “N/A” or “Unknown,” which can make the data less accurate.
Flexibility: In real life, sometimes not all information is available or necessary. Null values give you the flexibility to leave some fields empty when that information isn’t needed yet, or it doesn’t apply to every record.
Integrity: Using null values helps prevent mistakes. If you’re forced to put random or incorrect data just to fill in a field, it can create confusion or errors. Null values ensure you only enter real data when it’s available.
Efficiency: Null values save space by not storing unnecessary or placeholder information. This makes the database more efficient, especially when certain fields don’t always need data.
Normalisation Forms
Normal Description
1NF :Atomic values, no repeating groups
2NF: 1NF + no partial dependencies
3NF :2NF + no transitive dependencies
BCNF: 3NF + every determinant is a candidate key
4NF: BCNF + no multi-valued dependencies
5NF: 4NF + every join dependency is implied by candidate keys