Exam Flashcards
What is a Data Model?
1. Mathematical representation of data. Examples: relational model = tables; semistructured model = trees/graphs. 2. Operations on data. 3. Constraints.
What is a Relation?
A Relation is a table with Attributes(column headers), Tuples(rows) and Relation name.
Relation Schema, Database Schema, Database
Relation schema = relation name and attribute list. Optionally: types of attributes. Database schema = set of all relation schemes in the database. Database = collection of relations.
A Key in Relations
Key = tuples cannot have
the same value in all key attributes.
Creating a Relation in SQL
Simplest form is: CREATE TABLE (
);
To delete a relation:
DROP TABLE ;
Elements of Table Declarations
Most basic element: an attribute and its
type.
The most common types are:
INT or INTEGER (synonyms).
REAL or FLOAT (synonyms).
CHAR(n) = fixed-length string of n characters.
VARCHAR(n) = variable-length string of up to n characters.
Declaring Keys
An attribute or list of attributes may be
declared PRIMARY KEY or UNIQUE.
Either says that no two tuples of the
relation may agree in all the attribute(s)
on the list.
Declaring Single-Attribute Keys
Place PRIMARY KEY or UNIQUE after the
type in the declaration of the attribute.
Declaring Multi-attribute Keys
A key declaration can also be another
element in the list of elements of a CREATE TABLE statement.
This form is essential if the key consists
of more than one attribute.
May be used even for one-attribute keys.
PRIMARY KEY vs. UNIQUE
- There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes.
- No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.
Graphs of Semi-structured Data
- Nodes = objects.
- Arc labels (properties of objects).
- Atomic values at leaf nodes (nodes with no arcs out).
- Flexibility: no restriction on:
Labels out of a node.
Number of successors with a given label.
What is an “Algebra”
Mathematical system consisting of: Operands --- variables or values from which new values can be constructed. Operators --- symbols denoting procedures that construct new values from given values.
What is Relational Algebra?
An algebra whose operands are relations or variables that represent relations. Operators are designed to do the most common things that we need to do with relations in a database. The result is an algebra that can be used as a query language for relations.
Core Relational Algebra
Union, intersection, and difference. Usual set operations, but both operands must have the same relation schema. Selection: picking certain rows. Projection: picking certain columns. Products and joins: compositions of relations. Renaming of relations and attributes.
Selection in RA
R1 := σC (R2)
1. C is a condition (as in “if” statements) that
refers to attributes of R2.
2. R1 is all those tuples of R2 that satisfy C.
Projection in RA
R1 := πL (R2)
- L is a list of attributes from the schema of R2.
- R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1.
- Eliminate duplicate tuples, if any.
Extended Projection in RA
Using the same πL operator, we allow the list L to contain arbitrary expressions involving attributes:
- Arithmetic on attributes, e.g., A+B->C.
- Duplicate occurrences of the same attribute.
Product in RA
R3 := R1 Χ R2
Pair each tuple t1 of R1 with each tuple t2 of R2.
Concatenation t1t2 is a tuple of R3.
Schema of R3 is the attributes of R1 and then R2, in
order.
But beware attribute A of the same name in R1 and R2:
use R1.A and R2.A.
Theta-Join in RA
R3 := R1 XC R2
Take the product R1 Χ R2.
Then apply σC to the result.
As for σ, C can be any boolean-valued condition.
Natural Join in RA
A useful join variant (natural join) connects two relations by:
Equating attributes of the same name
Projecting out one copy of each pair of
equated attributes
Renaming in RA
The ρ operator gives a new schema to a
relation.
R1 := ρR1(A1,…,An)(R2) makes R1 be a relation with
attributes A1,…,An and the same tuples as R2.
Precedence of relational operators
- [σ,π,ρ] (highest).
- [Χ,(Theta-Join)].
- ∩.
- [∪,—]
Define a Bag
A bag (or multiset ) is like a set, but an element may appear more than once.
Operations on Bags
Selection applies to each tuple, so its effect on bags is like its effect on sets. Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.
Bag Union
An element appears in the union of two
bags the sum of the number of times it
appears in each bag.
Bag Intersection
An element appears in the intersection
of two bags the minimum of the
number of times it appears in either
Bag Difference
An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B. But never less than 0 times
Bag Laws != Set Laws
The commutative law for
union (R∪S =S∪R) DOES hold for bags.
Set union is idempotent, meaning that S∪S = S.
However, for bags, if x appears n times in S, then it appears 2n times in S∪S.
Select-From-Where Statements
SELECT desired attributes
FROM one or more tables
WHERE condition about tuples of
the tables
Meaning of Single-Relation Query
Begin with the relation in the FROM clause.
Apply the selection indicated by the WHERE clause.
Apply the extended projection indicated by the SELECT clause.
Renaming Attributes
If you want the result to have different attribute names, use “AS ” to rename an attribute.
Patterns in SQL
A condition can compare a string to a pattern by:
LIKE or NOT LIKE
Pattern is a quoted string with % = “any string”; _ = “any character.”
Comparing NULL’s to Values
The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN.
Comparing any value (including NULL itself) with NULL yields UNKNOWN.
A tuple is in a query answer iff the WHERE clause is TRUE (not FALSE or UNKNOWN).
Three-Valued Logic
To understand how AND, OR, and NOT work in 3-valued logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½.
AND = MIN; OR = MAX, NOT(x) = 1-x.
Subqueries
A parenthesized SELECT-FROM-WHERE statement (subquery ) can be used as a value in a number of places, including FROM and WHERE clauses
The IN Operator
IN () is true if and only if the tuple is a member of the relation produced by the subquery.
Opposite: NOT IN ().
IN-expressions can appear in WHERE clauses.
The Exists Operator
EXISTS() is true if and only if the subquery result is not empty.
The Operator ANY`
x = ANY() is a boolean condition that is true iff x equals at least one tuple in the subquery result.
= could be any comparison operator.
The Operator ALL
x <> ALL() is true iff for every tuple t in the relation, x is not equal to t.
That is, x is not in the subquery result.
<> can be any comparison operator.
Union, Intersection, and Difference
Union, intersection, and difference of relations are expressed by the following forms, each involving subqueries:
() UNION ()
() INTERSECT ()
() EXCEPT ()
Controlling Duplicate Elimination
Force the result to be a set by SELECT DISTINCT . . .
Force the result to be a bag (i.e., don’t eliminate duplicates) by ALL, as in . . . UNION ALL . . .
Products and Natural Joins in SQL
Natural join:
R NATURAL JOIN S;
Product:
R CROSS JOIN S;
Theta Join in SQL
R JOIN S ON
Duplicate Elimination in Ext. RA
R1 := δ(R2).
R1 consists of one copy of each tuple that appears in R2 one or more times.
Sorting in Ext. RA
R1 := τL (R2).
L is a list of some of the attributes of R2.
R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on.
Break ties arbitrarily.
τ is the only operator whose result is neither a set nor a bag.
Grouping Operator in Ext. RA
R1 := γL (R2). L is a list of elements that are either:
1. Individual (grouping ) attributes.
2. AGG(A ), where AGG is one of the aggregation
operators and A is an attribute.
An arrow and a new attribute name renames the component.
Group R according to all the grouping attributes on list L.
Within each group, compute AGG(A ) for each aggregation on list L.
Outerjoin in Ext. RA
Suppose we join R ⋈c S.
A tuple of R that has no tuple of S with which it joins is said to be dangling.
Similarly for a tuple of S.
Outerjoin preserves dangling tuples by padding them NULL.
It is modified by:
Optional NATURAL in front of OUTER.
Optional ON after JOIN.
Optional LEFT, RIGHT, or FULL before OUTER.
Aggregations in SQL
SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column.
Grouping in SQL
The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group
HAVING Clauses
HAVING may follow a GROUP BY clause.
If so, the condition applies to each group, and groups not satisfying the condition are eliminated
Kinds of Database Modifications
Three kinds of modifications:
Insert a tuple or tuples.
Delete a tuple or tuples.
Update the value(s) of an existing tuple or tuples
Insertion in SQL
To insert a single tuple:
INSERT INTO
VALUES ( );
We may add to the relation name a list of attributes
Adding Default Values
In a CREATE TABLE statement, we can follow an attribute by DEFAULT and a value.
When an inserted tuple has no value for that attribute, the default will be used.
We may insert the entire result of a query into a relation, using the form:
INSERT INTO
( );
Deletion
To delete tuples satisfying a condition from some relation:
DELETE FROM
WHERE ;
Updates
To change certain attributes in certain tuples of a relation:
UPDATE
SET
WHERE ;
Constraints and Triggers
A constraint is a relationship among data elements that the DBMS is required to enforce.
Triggers are only executed when a specified condition occurs, e.g., insertion of a tuple.
Kinds of Constraints
Keys. Foreign-key, or referential-integrity. Value-based constraints. Tuple-based constraints. Assertions: any SQL boolean expression
Foreign Keys
Values appearing in attributes of one relation must appear together in certain attributes of another relation.
Use keyword REFERENCES, either:
After an attribute (for one-attribute keys).
As an element of the schema:
FOREIGN KEY ()
REFERENCES ()
Referenced attributes must be declared PRIMARY KEY or UNIQUE.
Enforcing Foreign-Key Constraints
If there is a foreign-key constraint from relation R to relation S, two violations are possible:
An insert or update to R introduces values not found
in S.
A deletion or update to S causes some tuples of R to
“dangle.”
A deletion or update to Beers that removes a beer value found in some tuples of Sells can be handled in three ways. List them
Default : Reject the modification.
Cascade : Make the same changes in Sells.
Deleted beer: delete Sells tuple.
Updated beer: change value in Sells.
Set NULL : Change the beer to NULL
When we declare a foreign key, we may choose policies SET NULL or CASCADE independently for deletions and updates.
Follow the foreign-key declaration by:
ON [UPDATE, DELETE][SET NULL CASCADE]
Attribute-Based Checks
Constraints on the value of a particular attribute.
Add CHECK() to the declaration for the attribute.
The condition may use the name of the attribute, but any other relation or attribute name must be in a subquery.
Attribute-based checks are performed only when a value for that attribute is inserted or updated
Tuple-Based Checks
CHECK () may be added as a relation-schema element.
The condition may refer to any attribute of the relation.
But other attributes or relations require a subquery.
Checked on insert or update only
Assertions
These are database-schema elements, like relations or views.
Defined by:
CREATE ASSERTION
CHECK ();
Condition may refer to any relation or attribute in the database schema.
In principle, we must check every assertion after every modification to any relation of the database.
A clever system can observe that only certain changes could cause a given assertion to be violated.
Event-Condition-Action Rules (Triggers)
Event : typically a type of database modification, e.g., “insert on Sells.”
Condition : Any SQL boolean-valued expression.
Action : Any SQL statements.
CREATE TRIGGER
CREATE TRIGGER
Or:
CREATE OR REPLACE TRIGGER
Trigger Event
AFTER can be BEFORE.
Also, INSTEAD OF, if the relation is a view.
INSERT can be DELETE or UPDATE.
And UPDATE can be UPDATE . . . ON a particular
attribute.
Trigger FOR EACH ROW
Triggers are either “row-level” or “statement-level.”
FOR EACH ROW indicates row-level; its absence indicates statement-level.
Row level triggers : execute once for each modified tuple.
Statement-level triggers : execute once for a SQL statement, regardless of how many tuples are modified
Trigger REFERENCING
INSERT statements imply a new tuple (for row-level) or new table (for statement-level).
The “table” is the set of inserted tuples.
DELETE implies an old tuple or table.
UPDATE implies both.
Refer to these by
[NEW OLD][TUPLE TABLE] AS
Triggering The Condition
Any boolean-valued condition.
Evaluated on the database as it would exist before or after the triggering event, depending on whether BEFORE or AFTER is used.
Access the new/old tuple/table through the names in the REFERENCING clause
Triggering The Action
There can be more than one SQL statement in the action.
Surround by BEGIN . . . END if there is more than one
Transactions
Transaction = process involving database queries and/or modification.
Normally with some strong properties regarding concurrency.
Formed in SQL from single statements or explicit programmer control.
ACID Transactions
ACID transactions are:
Atomic : Whole transaction or none is done.
Consistent : Database constraints preserved.
Isolated : It appears to the user as if only one process executes at a time.
Durable : Effects of a process survive a crash
COMMIT
The SQL statement COMMIT causes a transaction to complete.
It’s database modifications are now permanent in the database.
ROLLBACK
The SQL statement ROLLBACK also causes the transaction to end, but by aborting.
No effects on the database.
Failures like division by 0 or a constraint violation can also cause rollback, even if the programmer does not request it.
Isolation Levels
SQL defines four isolation levels = choices about what interactions are allowed by transactions that execute at about the same time.
Choosing the Isolation Level
Within a transaction, we can say: SET TRANSACTION ISOLATION LEVEL X where X = SERIALIZABLE REPEATABLE READ READ COMMITTED READ UNCOMMITTED
Serializable Transactions
If Sally = (max)(min) and Joe = (del)(ins) are each transactions, and Sally runs with isolation level SERIALIZABLE, then she will see the database either before or after Joe runs, but not in the middle
Read-Commited Transactions
If Sally runs with isolation level READ COMMITTED, then she can see only committed data, but not necessarily the same data each time
Repeatable-Read Transactions
Requirement is like read-committed, plus: if data is read again, then everything seen the first time will be seen the second time. But the second and subsequent reads may see more tuples as well.
Read Uncommitted
A transaction running under READ UNCOMMITTED can see data in the database, even if it was written by a transaction that has not committed (and may never).
Views
A view is a relation defined in terms of stored tables (called base tables ) and other views.
Two kinds:
Virtual = not stored in the database; just a query for
constructing the relation.
Materialized = actually constructed and stored.
Declaring Views
Declare by:
CREATE [MATERIALIZED] VIEW AS ;
Default is virtual.
Indexes
Index = data structure used to speed access to tuples of a relation, given values of one or more attributes.
Could be a hash table, but in a DBMS it is always a balanced search tree with giant nodes (a full disk page) called a B-tree.
Declaring Indexes
Typical syntax:
CREATE INDEX BeerInd ON Beers(manf);
CREATE INDEX SellInd ON Sells(bar, beer);
Entity Sets
Entity = “thing” or object.
Entity set = collection of similar entities.
Attribute = property of (the entities of) an
entity set.
Relationships in ER
A relationship connects two or more
entity sets.
It is represented by a diamond, with
lines to each of the entity sets involved
Relationship Set
The current “value” of an entity set is the set of entities that belong to it. The “value” of a relationship is a relationship set, a set of tuples with one component for each related entity set
Many-Many Relationships
In a many-many relationship, an entity
of either set can be connected to many
entities of the other set
Many-One Relationships
Some binary relationships are many-one from one entity set to another.
Each entity of the first set is connected to at most one entity of the second set.
But an entity of the second set can be connected to zero, one, or many entities of the first set
One-One Relationships
In a one-one relationship, each entity of
either entity set is related to at most one
entity of the other set
Representing “Multiplicity”
Show a many-one relationship by an arrow entering the “one” side. Show a one-one relationship by arrows entering both entity sets. Rounded arrow = “exactly one,” i.e., each entity of the first set is related to exactly one entity of the target set.
Attributes on Relationships
Sometimes it is useful to attach an
attribute to a relationship.
Think of this attribute as a property of
tuples in the relationship set.
Roles in ER
Sometimes an entity set appears more than once in a relationship. Label the edges between the relationship and the entity set with names called roles
Subclasses
Subclass = special case = fewer entities = more properties. Assume subclasses form a tree (no multiple inheritance). Isa triangles indicate the subclass relationship (Point to the superclass). E/R entities have representatives in all subclasses to which they belong
Keys
A key is a set of attributes for one
entity set such that no two entities in
this set agree on all the attributes of
the key. It is allowed for two entities to agree on
some, but not all, of the key attributes.
We must designate a key for every
entity set
Weak Entity Sets
Entity set E is said to be weak if in order to identify entities of E uniquely, we need to follow one or more many-one relationships from E and include the key of the related entities from the connected entity sets.
A weak entity set has one or more many-one relationships to other (supporting) entity sets
Decomposition
d={R1,…,Rk} decomposition, if R1U…URk=R
Functional Dependencies
X ->Y is an assertion about a relation R that
whenever two tuples of R agree on all the
attributes of X, then they must also agree on all
attributes in set Y
Keys of Relations FD
K is a superkey for relation R if K functionally determines all of R.
K is a key for R if K is a superkey, but no proper subset of K is a superkey.
Armstrong axioms
A1 (reflexivity or trivial fd): if Y subset of X then X->Y.
A2 (augmentation): if X->Y then XZ->YZ.
A3 (tranzitivity): if X->Y and Y->Z then X->Z
More rules about functional dependencies
Splitting (decomposition) rule X->Y and Z->Y then X->Z. Combining (union) rule X->Y and X->Z then X->YZ. Pseudotranzitivity X->Y and WY->Z then XW->Z.
Closure of set of attributes
X+(F):={A | F|-X->A}
The closure of X under the FD’s in S is the set of
attributes A such that every relation that satisfies
all the FD’s in set S also satisfies XA, that is
XA follows from the FD’s of S.