ch8 Flashcards
Explain the unary relational operations: SELECT and PROJECT.
SELECT:
- ~~row op: way of putting constraints to get a subset of the tuples you want from a relation in the result
- ~~filter
- ~~restrict
- ~~horizontal partition (tuples that are part of result and tuples that are not)
- sigma_{selection condition}(relation)
PROJECT:
- ~~column op: way of selecting attributes from a relation that you want in a result
- ~~vertical partition (attributes that are part of the result and attributes that are not)
- pi_{attr list}(relation)
Given two union-compatible relations R and S, define the three operations UNION, INTERSECTION, and SET DIFFERENCE on them
UNION:
* R UNION S is the set of tuples that are in R or are in S or are in R and in S
INTERSECTION:
* R INTERSECTION S is the set of tuples that are in R and in S
SET DIFFERENCE:
* R SET DIFFERENCE S is the set of tuples that are in R but not in S
Explain the CARTESIAN PRODUCT operation. What is the degree and cardinality of the resultant relation? The CARTESTIAN PRODUCT operation by itself is generally meaningless; explain how it can be made meaningful.
CARTESTIAN PRODUCT
AKA CROSS PRODUCT
AKA CROSS JOIN:
* R X S
* a cartesian product gives you all possible combinations of tuples from both relations
* degree of resulting relation is n+m where n is the number of attributes in R and m is the number of attributes in S
* cardinality of resulting relation is n_R*n_S where n is the number of tuples from R and n_S is the number of tuples in S
- the CARTESIAN PRODUCT can be useful if its used in combination with a SELECT operation
- this is so common and useful in DB systems that they turned the CARTESIAN PRODUCT, then SELECT operation into its own thing: a JOIN
example of CARTESIAN PRODUCT: USER: email,firstname,favoritecolor hmiller40,hope,blue dlalich90,dan,blue morleyt,Morley,grey
FOOTBALLTEAM: city,mascot,conference philadelphia,eagles,NFC cleveland,browns,AFC kansas city,chiefs,AFC green bay, packers,NFC
USER X FOOTBALLTEAM
email,firstname,favoritecolor,city,mascot,conference
hmiller40,hope,blue,philadelphia,eagles,NFC
hmiller40,hope,blue,cleveland,browns,AFC
hmiller40,hope,blue,kansas city,chiefs,AFC
hmiller40,hope,blue,green bay,packers,NFC
dlalich90,dan,blue,philadelphia,eagles,NFC
dlalich90,dan,blue,cleveland,browns,AFC
dlalich90,dan,blue,kansas city,chiefs,AFC
dlalich90,dan,blue,green bay,packers,NFC
morleyt,Morley,gray,philadelphia,eagles,NFC
morleyt,Morley,gray,cleveland,browns,AFC
morleyt,Morley,gray,kansas city,chiefs,AFC
morleyt,Morley,gray,green bay,packers,NFC
degree = n + m = 3 + 3 = 6 cardinality = n_R * n_S = 3 * 4 = 12
Discuss the various types of inner join operations. Why is theta join required?
JOIN in general: a CROSS PRODUCT followed by a SELECTION
inner joins: joins where tuples that are NULL or do not satisfy the join condition are eliminated from the result
outer joins: joins where all of the tuples from one or both relations are included in the results regardless of whether or not they satisfy the join condition; pad the mismatches with NULLs as needed
THETA JOIN: a type of inner join with a general condition like R \bowtie_{\theta} S where theta is the join condition
- tuples that satisfy the join condition are included in the result
- tuples that do not satisfy the join condition or are null are not included in the result
EQUIJOIN: a type of inner join where the only operator used in the join condition (theta) is equals
NATURAL JOIN *: a type of inner join; an EQUIJOIN that then removes one set of the duplicate attributes from one of the relations in the result; requires that the attributes in the join condition have the same name in both relations; join condition is constructed by equating each pair of join attributes that have the same name in both relations and combining these conditions with AND
LEFT OUTER JOIN: include all of the tuples from the left relation in the result regardless of whether or not they satisfy the join condition
RIGHT OUTER JOIN: include all of the tuples from the right relation in the result regardless of whether or not they satisfy the join condition
FULL OUTER JOIN: preserve all of the tuples from the left and right relations in the result regardless of whether or not they satisfy the JOIN condition
What is the difference between EQUIJOIN and NATURAL JOIN operation? Explain with the help of an example.
an EQUIJOIN (R \bowtie_{\theta}) is a type of inner join where the only operator used in the join condition (\theta) is the equals sign
a NATURAL JOIN (R * S) is a type of inner join and it’s basically just an EQUIJOIN with one set of the superfluous attributes removed
ex: USER name,email Hope,hmiller40 Dan,dlalich90 Morley,morleyt
INTEREST user_email,interest hmiller40,running hmiller40,cats hmiller40,football dlalich90,football dlalich90,space dlalich90,video games morleyt,sleeping morleyt,eating morleyt,getting on the counter
EQUIJOIN: USER \bowtie_{email=user_email} INTEREST name,email,user_email,interest Hope,hmiller40,hmiller40,running Hope,hmiller40,hmiller40,cats Hope,hmiller40,hmiller40,football Dan,dlalich90,dlalich90,football Dan,dlalich90,dlalich90,space Dan,dlalich90,dlalich90,video games Morley,morleyt,morleyt,sleeping Morley,morleyt,morleyt,eating Morley,morleyt,morleyt,getting on the counter
NATURAL JOIN: **IF INTEREST.user_email were just called "email" instead** USER * INTEREST name,email,interest Hope,hmiller40,running Hope,hmiller40,cats Hope,hmiller40,football Dan,dlalich90,football Dan,dlalich90,space Dan,dlalich90,video games Morley,morleyt,sleeping Morley,morleyt,eating Morley,morleyt,getting on the counter
What is the FUNCTION operation? For what is it used?
not really sure where this came from… assuming “FUNCTION” == “AGGREGATE FUNCTION”…
AGGREGATE FUNCTION is used to group tuples in a relation by applying an aggregate function independently to each group.
examples of functions: COUNT, AVERAGE, MIN, MAX, SUM
How are the OUTER JOIN operations different from the INNER JOIN operations? How is the OUTER UNION operation different from UNION?
in INNER JOINs: tuples that are NULL or do not satisfy the join condition are not included in the result
in OUTER JOINs: all tuples from one or both relations are included in the result regardless of whether they satisfy the join condition or not
in UNIONs (ex: R U S): only tuples that are in both R and S are included in the result
in OUTER UNIONs: takes the unions of two relations R(X,Y) and S(X,Z) that are only partially compatible (i.e. not type compatible); the result includes tuples where the attributes are union compatible (each such tuple is only represented once) and tuples where the attributes are not union compatible
* OUTER JOIN == FULL OUTER JOIN on the common attributes of both relations
In what sense does relational calculus differ from relational algebra, and in what sense are they similar?
RELATIONAL ALGEBRA:
- procedural
- how the query is written influences how the query is evaluated
RELATIONAL CALCULUS:
- nonprocedural, declaritive
- how the query is written does not effect how the query is evaluted; i.e. specifies WHAT to retrieve, rather than HOW to retrieve it
BOTH:
* same expressive power (both are relationally complete)
How is a query tree representation different from a query graph representation of queries?
- a query tree implies the order that the operations should be applied
- a query graph does not imply any order to obtain the result
- now apparently query trees are prefered because the RDBMS modules have to specify an order of execution anyway
Discuss the transformation of the universal and existential quantifiers.
“One general transformation can be described informally as follows:
- Transform one type of quantifier into the other with negation (preceded by NOT);
- AND and OR replace one another;
- a negated formula becomes unnegated;
- and an unnegated formula becomes negated
Define the following terms with respect to the tuple calculus: tuple variable
t
- a variable that ranges over a particular DB relation
- i.e. it can take any individual tuple from that relation as its value
Define the following terms with respect to the tuple calculus: range relation
R
- for each tuple variable; specified by a condition of the form R(t)
- if you don’t specify a condition then the tuple variable ranges over all possible tuples IN THE UNIVERSE!!!
Define the following terms with respect to the tuple calculus: atom
- predicate calculus atoms
- make up a formula
- can be one of the following:
- form R(t_i) where:
* R is a relation name,
* t_i is a tuple variable
* identifies the range of the tuple variable t_i as the relation whose name is R; evaluates to TRUE if t_i is a tuple in relation R, else FALSE - form t_i.A op t_j.B where:
* op is one of the comparison operators {=, !=, , <=, >=},
* t_i and t_j are tuple variables,
* A is an attribute of the relation on which t_i ranges,
* B is an attribute of the relation on which t_j ranges - form t_i.A op c or c op t_j.B where:
* op is one of the comparison operators {=, !=, , <=, >=},
* t_i and t_j are tuple variables,
* A is an attribute of the relation on which t_i ranges,
* B is an attribute of the relation on which t_j ranges
* c is a constant value
Define the following terms with respect to the tuple calculus: formula
AKA condition
AKA well-formed formula
AKA WFF
COND
* made up of one or more atoms connected via the logical operators AND OR NOT and defined recursively by the following two rules:
- Every atom is a formula
- If F1 and F2 are formulas, then so are:
* (F1 AND F2)
* (F1 OR F2)
* NOT (F1)
* NOT (F2)
the truth values of these formulas are derived from their component formulas F1 and F2 as follows:
a. (F1 AND F2) is TRUE if both F1 and F2 are TRUE; otherwise FALSE
b. (F1 OR F2) is FALSE if both F1 and F2 are FALSE; otherwise TRUE
c. NOT (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE
d. NOT (F2) is TRUE if F2 is FALSE; it is FALSE if F2 IS TRUE
Define the following terms with respect to the tuple calculus: expression
a general expression of tuple calculus is of the form:
{t_1.A_j, t_2.A_k, …, t_n.A_m | COND(t_1, t_2, …, t_n, t_{n+1}, t_{n+2}, …, t_{n+m} ) }
where:
* t_1, t_2, …, t_n, t_{n+1}, t_{n+2}, …, t_{n+m} are tuple variables
* each A_i is an attribute of the relation on which t_i ranges
* COND is a condition or formula [made up of atoms, meets the rules that every atom is a formula and that if F1 and F2 are formulas then so are (F1 AND F2), (F1 OR F2), NOT(F1), NOT(F2) ]