Exam 2 Flashcards
Popular DBMS systems?
DB2 from IBM Oracle from Oracle Sybase DBMS from SAP SqlServer from Microsoft MySql (Open Source) PostgreSQL (Open Source)
What is a flat file?
Each relation in the database has a flat structure
What is a table?
Each row in the table represents a collection of related data values
What does a tuple represent in a database?
a row
What is an attribute in a database?
a column header
What does a table represent?
a relation
What is a domain?
A domain D is a set of atomic values
What is an atomic value?
Each value in the domain is indivisible
Example domains?
Usa_phone_numbers: the set of 10 digit phone numbers valid in the United States Academic_department_names: The set of academic department names in a university (Computer Science, Economics, Physics)
What is a data type / format?
data type for the domain: Usa_phone_numbers can be declared as a character string of the form (ddd)ddd-dddd, where each d is a numeric (decimal) digit and the first three digits form a valid telephone area code
What is relation schema?
Made up of a relation name R and a list of attributes A1, A2, … An Used to describe a relation Denoted as R
Attribute in relation schema
Each attribute Ai is the name of a role played by a domain D in the relation schema R
Domain in relation schema
D is called the domain of Ai and is denoted by dom(Ai)
what is a degree (arity)
Number of attributes n in the relation schema
Relation of degree 7 example:
STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa) Also sometimes written as… STUDENT(Name: string , Ssn: string , Home_phone: string , Address: string , Office_phone: string , Age: integer , Gpa: real )
What is a Relation state?
A relation state of schema R(A 1 , A 2 , … , A n ) is a set of tuples: r = {t 1 , t 2 , … , t m } Also called relational intention or relation extension:
Cartesian product (x)
Specifies all possible combinations of values from the underlying domains
Current relation state
Only the valid tuples that represent a particular state of the real world
Subset (⊆)
A smaller part of a larger set
Cardinality
Total number of values in a domain
Ordering of tuples in a relation. How does that work?
Elements in a set have no order. Tuples in a mathematical relation have no order However in programming, tuples do have an order There is no preference for one ordering over another
Alternative definition of a relation
Each tuple ti is a mapping from one R to D. D is a union of the attribute domains Tuple is a set of attribute value pairs Tuples list has no order
Self describing data
When a description of each value (like the attribute name) is included in the tuple
Flat relational model
Each value in a tuple is an atomic value that is not divisible Composite and multivalued attributes are now allowed Also called first normal form assumption
Null values
Null can mean value unknown, not available, or does not apply We cannot compare null values. For instance, 2 customers with null addresses do not have the same address! It’s best to avoid null values as much as possible in database design
Assertion
Assertion is a predicate expressing a condition we wish the database to always satisfy Predicate The values in each tuple are values that satisfy the predicate
Closed world assumption
The only true facts in the universe are those present within the extension of the relation
Relational notation
A relation schema R of degree n is denoted by R(A 1 , A 2 , … , A n ). The uppercase letters Q, R, S denote relation names. The lowercase letters q, r, s denote relation states. The letters t, u, v denote tuples.
Constraints
Restrictions on values in the database state Derived from rules in the miniworld
Inherent constraints
Constraints that are inherent in the data model Also called implicit constraints Example: can’t have duplicate tuples
Schema-based constraints
Constraints directly expressed in the schema Also called explicit constraints Domain constraints, key constraints, constraints on ulls, entity integrity, etc
Application-based constraints
Constraints that can’t be expressed in schema but is enforced by the application programs Also called semantic constraints or business rules
Domain constraints
Within each tuple, the value of each attribute must be an atomic value from the domain
Candidate key
A key that can possibly be a primary key in a table Each attribute for a row is unique
Primary Key
The key chosen to maintain uniqueness in a table A candidate key that is used to identify tuples in a relation
Superkey
This is basically a combination of multiple keys that form a primary key Specifies a uniqueness constraint that no two distinct tuples in any state r of R can have the same value for SK Similar to a primary key? Can be used to uniquely identify each tuple
Unique key
Other candidate keys that are not chosen to be a primary key
Relational database schema
A relational database schema S is a set of relation schema and a set of integrity constraints IC The tables and the columns
Relational database state
The rows of a table Also called snapshot or instance
A database that does not obey all its integrity constraints…
A database that does not obey all its integrity constraints is considered not valid Otherwise it is called a valid state Integrity constraints are expected to hold for every valid database state
Entity integrity constraint
No primary key value can be NULL This means we can’t identify some tuples
Referential integrity constraint
Used to maintain the consistency among tuples in two relations For example, a department number in the EMPLOYEE tuple must match an actual department number in the DEPARTMENT table
Foreign key
If a relational schema R1 is a foreign key that references R2, it must have the same domain and must actually refer to a real value
Referencing relation
R1 is the referencing relation and R2 is the referenced relation If this holds, the referential integrity constraint holds
Semantic integrity constraints
Not part of DDL For example: age of child should not exceed age of parent
Constraint specification language
How to specify more general constraints
Trigger and assertions
Can specify some of the constraints in SQL However, more common to check in the actual application code itself
State constraints
Define constraints on a valid state of the database
Transaction constraint
Deal with changes to the database Typically enforced by application programs
Result relation
The answer to a user’s query
Database modification or update
3 basic mechanics: insert, delete, update (modify)
How does an insert work when talking about tuples and schemas?
A new tuple t gets inserted into relation R Can potentially violate constraints Violate domain constraint Attribute value isn’t in domain Violate key constraints New tuple t is already in the relation Violate entity integrity If any part of the primary key is NULL Violate referential integrity If the foreign key is bad The default action is to reject an insert if it violates something
How does a delete work when talking about tuples and schemas?
Removing a tuple Can only violate referential integrity if it’s being referenced from other tuples in the database Several ways to fix this: Restrict Reject the deletion Cascade Propagate the deletion by deleting tuples that reference the tuple being deleted Set null / set default Modify the referencing attribute instead Can cause a violation if part of primary key
How does update work when talking about tuples and schemas?
Change value or one or more attributes in a tuple Only need to check if data type and domain are correct Modifying a primary key can cause problems similar to Delete
What is a transaction?
Executing a database operation Can include any number of retrieval operations or update operations
Relational calculus
Higher level declarative language for specifying relational queries No order of operations to specify how to retrieve the query result
Unary operations (relational algebra)
SELECT and PROJECT operate on single relations
Binary operations (relational algebra)
Operate on two tables by combining related tuples based on join conditions
Aggregate functions (relational algebra)
Conditions that can summarize data from the tables
SELECT operation (relational algebra)
Choose a subset of the tuples from a relation that satisfies a selection condition σ Salary > 30000 ( EMPLOYEE ) σ (sigma) is used to denote the SELECT operator Each select is unary applying to a single relation and each tuple individually Degree is the number of attributes is the same as R for the relation Selectivity Fraction of tuples selected by a selection condition
PROJECT operation (relational algebra)
Selects certain columns from the table and discards other columns π Lname, Fname, Salary ( EMPLOYEE ) π (pi) is the symbol used to represent the PROJECT operation Degree is equal to the number of attributes in the list Project operation removes any duplicate tuples (known as duplicate elimination) since the result is a set (if duplicates weren’t removed it would be a multiset or bag) Similar to SELECT DISTINCT in SQL
How to do a sequence of events in relational algebra
In-line expression for relational algebra π Fname, Lname, Salary (σ Dno = 5 ( EMPLOYEE )) assignment operation DEP5_EMPS ← σ Dno = 5 ( EMPLOYEE ) RESULT ← π Fname, Lname, Salary ( DEP5_EMPS ) Rename TEMP ← σ Dno = 5 ( EMPLOYEE ) R ( First_name, Last_name, Salary ) ← π Fname, Lname, Salary ( TEMP )
Rename (relational algebra)
ρ S ( B1, B2, … , Bn ) ( R ) the symbol ρ (rho) is used to denote the RENAME operator
Binary set operators (relational algebra)
Union, intersection, set difference
Union (relational algebra)
R ∪ S Includes tuples that are in R or S or both
Intersection (relational algebra)
R ∩ S Includes tuples that are in both R and S
Set Difference (relational algebra)
R – S Includes tuples that are in R but not in S Not commutative
Cross Product (cartesian product, cross join) (relational algebra)
Denoted by × Concatenates all possible combination of tuples in the two relations (tables)
Join operation (relational algebra)
JOIN operation, denoted by ⋈ Used to combine related tuples from two relations Cartesian product followed by a select Where two id numbers match
Equijoin and natural join (relational algebra)
A join with an = is an equijoin For equijoins we have extra attributes with identical values Natural join is basically an EQUIJOIN followed by the removal of superfluous attributes Natural join is denoted by * Join attribute (which attribute to join on) Join selectivity: expected size of join result divided by maximum size These are types of Inner joins
Division operator (relational algebra)
The DIVISION operation, denoted by ÷ Retrieve the names of employees who work on all the projects that “john smith” works on Can be expressed as sequence: T1 ← π Y (R) T2 ← π Y ((S × T1) – R) T ← T1 – T2 Confusing Produces a relation R(X) that includes all tuples t[X] in R 1 (Z) that appear in R 1 in combination with every tuple from R 2 (Y), where Z = X ∪ Y.
Query tree (relational algebra)
Data structure that corresponds to a relational algebra expression The result for each step needs to be done before the next one can execute
Generalized projection (relational algebra)
Allows functions of attributes to be included in the projection list For example, tax * 0.5
Aggregate functions and grouping (relational algebra)
Sum, maximum, minimum, count F in cursive is used
Recursive closure operations (relational algebra)
Joining on the attributes of the same type
Outer join (relational algebra)
Used to keep leftover attributes in R or S or both after joining Left outer join ⟕ Keeps every tuple in first relation Right outer join ⟖ Keeps every tuple in second relation Full Outer join ⟗ Keeps every tuple in both relations
Tuple relational calculus
There is no order given how to evaluate a query This is a nonprocedural language Expressive power is identical SQL is based on tuple relational calculus
Tuple variables
{t | COND(t)} T is a tuple variable and COND(t) is a conditional expression that’s either true or false {t | EMPLOYEE(t) AND t.Salary > 50000 } {t.Fname, t.Lname | EMPLOYEE(t) AND t.Salary > 50000 } Range relation We need to specify a range relation R of t, otherwise it will pick tuples from everything in the universe Condition A condition to select a particular combination of tuples. Requested attributes A set of attributes to be retrieved
Expression in tuple relational calculus
{t 1 .A j , t 2 .A k , … , t n .A m | COND (t 1 , t 2 , …, t n , t n+1 , t n+2 , …, t n+m )} COND is condition or formula Formula is made up of atoms Atoms can be A relation and tuple variable A comparison operator between two attributes A comparison between an attribute and a regular number Truth value: a relation that yields true or false Every atom is a formula
What are quantifiers?
Universal quantifier ∀ Existential quantifier ∃
Free or bound?
A tuple variable in F that is an atom is free in F The free variables [hereafter abbreviated to FV] in a statement stand for objects that the statement says something about…The fact that you can plug in different values for a free variable means that it is free to stand for anything. Bound variables [hereafter abbreviated to BV], on the other hand, are simply letters that are used as a convenience to help express an idea and should not be thought of as standing for any particular object. A bound variable can always be replaced by a new variable without changing the meaning of the statement, and often the statement can be rephrased so that the bound variables are eliminated altogether.
Existential quantifier:
There exists some tuple that makes F true
Universal quantifier
For all situations To make the quantified formula TRUE , the inner formula must be TRUE for all tuples in the universe.
Select project join queries (tuple relational calculus)
Don’t involve complex quantification
Query graph (tuple relational calculus)
Graph of how the query is structured Represented by relation nodes Constant nodes
Safe expresion (tuple relational calculus)
Guaranteed to yield a finite number of tuples If not, it is unsafe {t | NOT (EMPLOYEE(t)) } is unsafe
Domain relational calculus
While SQL was being developed by IBM Research, another language QBE was being developed based on domain relational calculus Instead of variables ranging over tuples, the variables range over single values from domains of attributes {x 1 , x 2 , …, x n | COND (x 1 , x 2 , …, x n , x n+1 , x n+2 , …, x n+m )}
Logical database design
Also known as data model mapping How to create a relational schema from an EER diagram
Logical Database Design Step 1: Mapping regular entity types
For each entity type, create a relation R that has all the simple attributes of E. Include only the simple component attributes of a composite attribute Entities created from this are called entity relations because each tuple is an instance
Logical Database Design Step 2: Mapping of weak entity types
For each weak entity type W, create a relation R and include all simple attributes of W as attributes of R Add a key to point to the parent attribute
Logical Database Design Step 3: Mapping of binary 1:1 relationship types
3 approaches 1: foreign key approach Choose 1 relation and include it as a foreign key of the other 2: merge relation approach Merge the two entity types into a single relation 3: cross-reference / relationship relation approach Set up a third relation for cross referencing the primary keys of the two relations S and T
Logical Database Design Step 4: Mapping of binary 1:N relationship types
Two approaches: 1: foreign key approach Identify the relation S that specifies the participating entity type at the N side of the relationship. Include a foreign key for the other relation 2: relationship relation approach Create a cross reference table including primary keys and foreign keys to tie
Logical Database Design Step 5: Mapping of binary M:N relationship types
The only option here is to use a relationship relation (cross reference) option
Logical Database Design Step 6: Mapping of multivalued attributes
For each multivalued attribute A, create a new relation R Include foreign key and the name in the new table. The primary key can be listed multiple times
Logical Database Design Step 7: Mapping of n-ary relationship types
Use the relationship relation (cross reference) option For each n-ary relationship where n > 2, create a new cross reference table S to represent the relation. Add foreign keys to tie things together
Mapping of specialization or generalization
Two main options: Map the specialization into a single table Map into multiple tables
Logical Database Design Step 8: options for mapping specialization or generalization
Multiple relations: superclass and subclass Multiple relations: subclass relations only Single relation with one TYPE attribute Single relation with multiple TYPE attributes
Mapping of shared subclass
Must all have the same key
Logical Database Design Step 9: mapping of union types (categories)
Add a new surrogate key to correspond to the union type
How to map a normal entity with attributes?
The identifying attribute, A, becomes the primary key of the ET table. All the attributes become keys in the table
How to map composite attributes?
Map each part of the composite attribute (C is composed of D and E, so only map D and E). The relational model is flat.
How to map multi-value property types?
ET table with attribute A for primary key ET-F table with A and F as part of the primary key. BOTH ARE UNDERLINED!
How to map 1 - 1 relationship?
ET1 with primary key A and foreign key B ET2 with primary key B OR ET1 with primary key A ET2 with primary key B and foreign key B
How to map 1 - 1 relationship when one part of the relationship is mandatory?
If ET2 has a mandatory link to relationship R: ET1 with primary key A ET2 with primary key B and foreign key B since B can never be NULL in ET2
How to map 1 to many relationship types:
If ET2 is on the many side: ET1 with primary key A ET2 with primary key B and foreign key B
How to map many to many relationship types:
Add new table for R with A and B underlined together as foreign keys as part of the primary key to tie everything together.
How to map weak entity types with identifying relationships
If Et2 is the weak attribute: ET 1 table has A as the primary key ET2 has a table with A and B underlined together as the primary key
How to map a mandatory disjoint?
Two tables for each of the two types. ET1 has all the attributes specific to ET1, and all the attributes in its parent type ET ET2 has all the attributes specific to ET2, and all the attributes in its parent type ET
How to map a mandatory overlap?
For parent class: ET: Primary Key A, B ET1: Foreign Key A, C ET2: Foreign Key A, D
How to map a non-mandatory overlap?
For parent class: ET: Primary Key A, B ET1: Foreign Key A, C ET2: Foreign Key A, D
How to map a non-mandatory overlap?
For parent class: ET: Primary Key A, B ET1: Foreign Key A, C ET2: Foreign Key A, D
How to map a union type?
Make new primary key ET-ID for parent type with key B Table for ET1: Primary Key C with ET-ID foreign key Table for ET2: Primary Key D with ET-ID foreign key
R * S in relational algebra?
natural join
What does p looking thing in relational algebra do?
rename
(o looking thing) in relational algebra
selection
What is selecting?
Finding rows
What is projection?
Finding columns
What symbol for selection?
o looking thing
What symbol for projection?
pi
What is OR in relational algebra?
union
What is AND in relational algebra?
intersection
What does \ mean in relational difference?
Set difference in one set but not the other
Symbol for natural join in relational algebra?
*
What is the difference between theta and natural join?
theta works with less than and greater than natural join only works with =
symbol for theta join?
bowtie
bowtie symbol?
theta join
inner vs outer join?
inner join doesn’t add extra values
left outer join
preserve left operand tuples even if it doesn’t match the join conditions (leave as NULL)
Cartesian product?
combines each combination of tuples!
Divide By in relational algebra?
For A divided by B Return rows of A that match all of B must match up with every single value in divisor
p looking operator for relational algebra?
rename
symbol for range expression?
∈ t∈R means that t is a tuple of relation R
attribute value?
t.A`
symbol for comparison operator in relational calculus
theta
list of atoms for relational calculus?
range expression, attribute, comparing two constants
predicate
an atom is a predicate
Selection in tuple calculus
{r | r ∈ Regular User)
projection in tuple calculus
{r.Email, r.BirthYear, r.Sex | r ∈ RegularUser and r.HomeTown = ‘Atlanta’)}
union in tuple calculus
{s.City | there exists(r ∈ RegularUser)(s.City = r.CurrentCity) or there exists (t ∈ RegularUser)(s.City = t.HomeTown)
basic idea of tuple calculus?
instead of a series of operations, you’re describing the result (there exists something that fulfills these rules)
intersection in tuple calculus
{s.City | there exists(r ∈ RegularUser)(s.City = r.CurrentCity) and there exists (t ∈ RegularUser)(s.City = t.HomeTown)
set difference in tuple calculus
“and not”
natural join in tuple calculus
r.Year = s.Year and t.Email = r.Email and t.Year = r.Year r ranges over regular user s ranges over major 60s events t is output
cartesian product in tuple calculus
{r, s | r ranges over regular user and s ranges over user interests} remember this creates a combination of all combinations in both tables {t | ∃ p ∈ r ∃ q ∈ s (t[A] = p[A] ∧ t[B] = p[B] ∧ t[C] = q[C] ∧ t[D] = q[D])}
divide by in tuple calculus
find email for all users with at least all of the interests of user 1 written manually just like cartesian product there must be a tuple that matches all of these other tuples {t | ∃ p ∈ r ∀q ∈ s (p[B] = q[B] ⇒ t[A] = p[A])}