Database Flashcards
SQL Pipeline from query submit to result returned
[User] | Query | [Query Parser] - Parser, Preprocessor/Validation, Optimizer | Query Plan | [Execution Engine] | Result
ACID Properties of Database System
- Atomicity
- All or nothing
- Consistency
- Strong (always consistent)
- Eventual (May have stale data for short-while)
- Isolation
- each Transaction is executed in isolation (like in its own VM)
- Durability
- Result/effect of transaction, if completed, is permanent
Data Model (or java Class, or Data Structure)
Data Model (is similar to a Java Class) that it describes a “Table”
1) Structure of Data/Table (what fields/data members)
2) Constraints on Data (value type/range/format of each filed)
3) Operations on Data (what data operations insert/join/append/getter/setter)
Object instances to Java class is Tables to DBMS Relational Data Models
Inside Each Table, then the “Records” can be represented by Object instances. That is a DBMS Relational Data Models stores objects
General Data Model vs DBMS Data Model
General Data Model is like a java class, that it extends fro Java Object class.
The java Object class does NOT describe relationships between objects. Thus overall in general, java objects do not carry Relationship with other objects.
DBMS Relational Data Model (A “Table” class is a like a specialized java class, that all Data Models in DBMS extends from.
Each DBMS Relational Data Model Instance defines a “Table” or “Relation”
DBMS Relational Data Model class defines these members and behaviors:
1) Schema (attributes, and validators0
2) Tuples (records of the table)
3) Getter and Setter (of tuples)
// standard ops on Relational Data Model, such as // join, select, ...
// Schema of the Model can be used to generate attributes // and their validators.
class DBMS_Relational_DataModel { public relation_name; public List attributes; public List attributes_types; public String primary_key_column; public String Schema
public Set tuples; //records of the table DBMS_Relational_DataModel join(DBMS_Relational_DataModel that) { return new DBMS_Relational_DataModel( this.relation_name.join(that.relation_name, "\_\_"), this.attributes.append(that.attributes), this.attributes_types.append(that.attributes_types) ) } };
// Car table inherits ops such as join, select, …
Class Car extends DBMS_Relational_DataModel {
/* Creating a Car instance == Instantiate a Table /
Car() {
DBMS_Relational_DataModel(
“Cars”,
{“Index”, “Model”, “Make”, “MFR”, “Year”, “MRP”}
{Integer.class , String.class, String.class, String.class, Date.class, Integer.class}
),
“Index” / as primary key */
};
}
Representation of a DBMS Data Model Instance is
R( such as
R(“Index”, “Model”, “Make”,…“MRP”);
Relation, Instance, Table, Record, Keys
A Table is a Relation (among its attributes)
An Record/tuple of the table is an Instance of Relation
The set of attributes that uniquely identify a Record/Instance is “Keys Of Relationship”
(I.e. the constraints on the values of the “Keys” attributes is that they cannot have same value between two records/instances/tuples)
3 Data Representations in SQL
Table, View and (Transient Temporary Tables)
- Stored Relations (Tables)
- Views (Seen by clients, Computation Results, Temporary Tables in memory to present computation results) E.g. the “Join” view of two Tables.
- Temporary Tables (not seen by clients…intermediate tables created by DBMS implementation)
SQL user only can create TABLE or VIEW
CREATE TABLE employee …
or
CREATE VIEW employee2 AS SELECT ….
Why SQL is a high-level language
In DBMS impl, there is the class DBMS_Relational_DataModel, and can be used to instantiate each DBMS Relation class/Table. But from user’s perspective, we are not using java or C++ code to instantiate DBMS Relation Data Model instance. Instead, we use high-level language like SQL to “instantiate” a Table, and manipulate. This way it also does not matter if backend is implemented with JAVA or C or NodeJS etc.
E..g
SQL> CREATE table “Movies” (
“name”, CHAR(100),
“length”, INT)
)
will trigger DBMS to instance a new class
new DBMS_Relational_DataModel( "Movies", {"Index", "name", "length"} {Integer.class, String.class , Integer.class} ),
Algebra vs (Relational Algebra) vs SQL
Algi-brei-ik Query Language is the center of SQL
SQL built on Relational Algebra
**Algebra: Relationship between variables x, y z such as x^2 + y^2 = z^2
**Relational Algebra: Relationship between Variables that are Sets/Relation/tables (operator Overloading)
===== SET OPERATIONS====(union/intersection/difference)
“+” in Algebra means “join” In Relational Algebra
**SQL: High-level language that express Relational Algebra
Instead of saying X+Y we say X OUTTER JOIN Y
Relational Algebra = Set/Select/Project Operations
UNION = Set operations SELECT = Projection WHERE = Selection AS = Rename
Relational Algebra between two Table Variables R, S,
1) union (R u S) , intersection (R n S), difference (R - S), of tuples of Relation R, S
(‘UNION’ is not JOIN)
2) Selection (row elimination) return a new Relation with selected all cols and selected rows
3) Projection (col elimination)
Return a new Relation with all rows and selected cols (new duplicate tuples may result)
4) Renaming (meta data change only)
Statement:
SELECT last, first, address FROM employee AS ‘short_form’ WHERE last == ‘john’
SELECT
– ProjectionWHERE
– SelectionAS
– Rename
Relational Algebra = Set/Select/Project Operations
UNION | EXCEPT = Set operations (union, diff)
SELECT = Projection
WHERE = Selection
AS = Rename
Relational Algebra between two Table Variables R, S,
1) union (R u S) , intersection (R n S), difference (R - S), of tuples of Relation R, S
(‘UNION’ is not JOIN)
2) Selection (row elimination) return a new Relation with selected all cols and selected rows
3) Projection (col elimination)
Return a new Relation with all rows and selected cols (new duplicate tuples may result)
4) Renaming (meta data change only)
Statement:
SELECT last, first, address FROM employee AS ‘short_form’ WHERE last == ‘john’
SELECT
– ProjectionWHERE
– SelectionAS
– Rename
Cartesian Product (Cross Product)
Ka-‘ti-sian Product between R, S
R: m cols, u rows
S: n cols, v rows
R, S shares at least 1 column, let’s say it is B
Cartesian(R, S) on B = (m+n) cols, u * v rows
for every row/tuple in R, append each row from S to form a new tuple
r0 r0.B, s0.B, s0
r0 r0.B, s1.B, s1
…
r0 r0.B, svB, sv
r1 r1.B, s0B, s0
r1 r1.B, s1B, s1
…
r1 r1.B, svB, sv
... ru ru.B, s0B, s0 ru ru.B, s1B, s1 ... ru ru.B, svB, sv
Then different JOINs apply SELECTION and PROJECTION on the Cartesian (R, S) result
Natural Join R, S on attributes {A1, A2…}
Apply __________ on Cartesian Product
Apply ___SELECTION____ on Cartesian Product where WHERE condition
T = Cartesian Product (R, S)
T1 = SELEC * FROM T where R.A1 == S.A1 AND R.A2 == S.A2 AND R.A3 == S.A3
Theta Join R, S on any Condition C
Apply __________ on Cartesian Product
Apply ___SELECTION____ on Cartesian Product with WHERE condition
T = Cartesian Product (R, S)
SELEC * FROM T WHERE C
SQL Query–> Albegra Relation Operators:
Selection, Projection, Rename,
Union, Intersection, Difference
Join
SQL: 1 statement => Multiple Operations
SQL parse statement into primitive operations (Query Plan), then submit for Optimized Execution.
SELECT titles, years FROM movies AS fox_movies
where
length > 100
AND studioName == “Fox”
- T = Cartesian Product (R, S) for tables in FROM
- T1 = SELECT * FROM movies WHERE length > 100
- T2 = SELECT * FROM movies WHERE studioName==”fox”
return Project (titles, years) from (T1 n T2)
As such, the query is broken into
1) Cartesian product (FROM)
2) Selection (WHERE)
3) Intersection/Union/Difference (AND|OR)
4) Projection (SELECT)
5) Rename.(AS)
Map SQL to Operations SELECT-->? FROM --> ? WHERE (AND|OR) --> ? AS --> ?
SELECT–> Projection
FROM –> Cartesian Product
WHERE (AND|OR) –> Selection|Intersection/Union/Diff
AS –> Rename