Final Take 2.1 Flashcards
How does quality of of an E-R diagram affect normalization process?
A data model that effectively captures the entities and the relationships between those entities will require little to no modification when transposed into relations.
If a data model is done poorly, a number of changes to the relations will be required to remove anomalies and reduce data redundancy.
Relationship Class
Denotes a “connection” between entity classes.
There can be multiple relationships between entity classes.
Relationships should be named with a “verb phrase”
One phrase – grammar modified to reflect each direction EMPLOYEE has DEPARTMENT; DEPARTMENT has EMPLOYEE
EMPLOYEE mentors another EMPLOYEE; EMPLOYEE is mentored by another EMPLOYEE
Two phrases – specific for each direction
Relationship Entity
Denotes a “connection” between entity instances.
Relationships Degree
the degree of a relationship is the number of entity classes that participate in the relationship
Recursive Relationship
A relationship among entities of the same type.
The same entity class participates more than once in different roles.
A relationship between instances of the same entity class.
A recursive relationship will always have a degree of 1.
Binary Relationship
A relationship between exactly two entities.
A binary relationship will always have a degree of 2
Maximum Cardinality
The maximum number of entity instances that can occur on one side of a relationship.
The maximum cardinalities from each side come together to form the cardinality ratio, which is also known as the relationship type.
One-to-One (1:1)
One-to-Many; Many-to-One (1:N; N:1)
Many-to-Many (M:N)
If a specific maximum is known, you can use the number.
To determine the maximum cardinalities, you must ask a question from each perspective.
§ Can “ENTITY_A” have more than one “ENTITY_B”?
§ Can “ENTITY_B” have more than one “ENTITY_A”?
If an answer is ‘NO, there is at most one’, then the maximum cardinality is “one” for that perspective: Denoted by a single line going into the entity that is restricted to a maximum of one.
If an answer is ‘YES, there can be more than one’, then the maximum cardinality is “many” for that
perspective: Denoted by a crow’s foot going into the entity that can have many.
Minimum Cardinality
The minimum number of entity instances that have to occur on one side of a relationship.
To determine the minimum cardinalities, you must ask a question from each perspective.
Does “ENTITY_A” have to have an “ENTITY_B”?
Does “ENTITY_B” have to have an “ENTITY_A”?
If an answer is ‘NO, there doesn’t have to be one’, then the minimum cardinality is zero (optional) for
that perspective: Denoted by an oval placed before the maximum cardinality for the entity that is not required to participate in the relationship.
If an answer is ‘YES, there must be one”, then the minimum cardinality is one for that perspective: Denoted by a perpendicular line to the relationship, placed before the maximum cardinality for the
entity that is required to participate in the relationship.
Business Rules
Restrictions on the model
that reflect the business
activities being modeled.
Such rules are the basis for
determining the appropriate
elements to use in the E-R
diagram.
Has-A vs IS-A
‘IS-A’ is a relationship where one object is a subtype of another. A Car is a Vehicle, a Dog is an Animal; this relationship is manifested through inheritance. The ‘HAS-A’ relationship is where one object contains or uses another object
Supertype vs subtype
Denotes an “IS-A” relationship
Also known as Generalization Hierarchies
Supertype – A type of entity class that contains attributes that are common
across all specializations, A generalization
Subtype – A type of entity class that “inherits” attributes of its supertype as well as containing attributes of its own - A specialization
Strong vs Weak Entity
Strong entity – can exist independently of all other entities.
Supertypes. Denoted by: Square
Weak entity – can’t exist on its own; must have another entity to support it.Subtypes, Do not have their own identifier. Denoted by: round or double lines
When is a supertype/subtype relationships necessary?
There are attributes that apply to some, but not all, instances of an
entity.
There is a relationship that pertains to a defined subset of instances for
an entity.
Implications of having a relationship at supertype vs subtype level
Relationships at the supertype level indicate that instances of the
supertypes, including all subtypes, can participate in the relationship.
Relationships at the subtype level indicate only instances of that
subtype can participate in the relationship.
Completeness Constraints
The minimum number of subtype instances that a supertype instance must belong to.
To determine the type of completeness constraint, you must ask a question.
Must every instance of the supertype belong to a subtype?
- If an answer is ‘NO, an instance of the supertype doesn’t have to belong to any of the subtype(s)’, then
there is partial specialization. Denoted by a single line from the supertype to the supertype/subtype relationship indicator.
-If an answer is ‘YES, every instance of the supertype must belong to a subtype’, then there is total
specialization. Denoted by double lines from the supertype to the supertype/subtype relationship indicator.
Disjointedness Constraints
The maximum number of subtype instances that a supertype instance can belong to.
To determine the type of disjointedness constraint, you must ask a question.
- Can an instance of the supertype belong to more than one subtype?
- If an answer is ‘NO, an instance of the supertype may only belong to at most one subtype’, then the
disjoint rule is in effect. Denoted by a ‘d’ inside the supertype/subtype relationship indicator. - If an answer is ‘YES, an instance of the supertype can belong to more than one subtype’, then the
overlap rule is in effect. Denoted by an ‘o’ inside the supertype/subtype relationship indicator
Subtype discriminator
A discriminator is an attribute, or a group of attributes, in a supertype that will indicate any subtypes that an entity instance belongs to.
Disjoint Rule: A single attribute can be used as a discriminator in the supertype. The value for the discriminator will indicate which subtype, if any, an entity instance of the supertype belongs to.
Overlap Rule: For each subtype an attribute will need to be used as a discriminator in the supertype. The value of each attribute will indicate which subtypes an entity instance belongs to.
Transposing Process
- The name of each entity becomes the name of a “relation”.
- Attributes of the entity become attributes of the “relation”.
- The identifier of the entity becomes the proposed primary key of the “relation”, denoted by an underline.
- For a relationship, create a primary key (pk)/foreign key (fk) connection based on the type of relationship.
- Add a reference statement for the foreign key
a. Notation: RELATION_X(foreign key) mei RELATION(primary key) - Repeat steps 4 and 5 until all relationships have been implemented.
Foreign key placement for different types of relations
a. 1:1 – Examine the minimum cardinalities
i. If the minimum cardinalities are the same, then examine whether a simple or composite fk scenario exists, go with simple fk
ii. If the minimum cardinalities are different, go in the direction that avoids null values for the fk.
b. 1:N (N:1) – Take the pk from the relation on the “one” side and add it as a foreign key to the relation on the “many” side.
c. N:M – Create a new intersection relation.
i. The intersection relation will have a foreign key from each entity involved in the relationship.
ii. The fks will form a composite pk.
d. Supertype/Subtype
i. The pk of the supertype becomes a fk in each subtype.
ii. The fk in each subtype becomes the pk.
Participation Constraints
Total participation (existence dependency) – an entity can exist only if it participates in a specific
relationship.
Weak entities always have total participation
Partial participation – an entity can exist without participating in a specific relationship.
Identifying vs non-identifying relationship
Identifying Relationship
A solid relationship line that denotes that
the foreign key will need to help form a
primary key.
Non-Identifying Relationship
A dashed relationship line that denotes
that the foreign key will not be needed to
form a primary key.
Lookup Table
Also known as Reference table
Used to minimize storage of repetitive values
Ensures sure that only certain values are allowed
Helpful to create drop-down list of options in front-end interface
Typical use:
Store a code in the “main” table
Store the code and full value in the lookup table
Implementing a Foreign Key and Referential Integrity
Implementing a foreign key enforces referential integrity: can’t add a value to a foreign key, unless that value exists for the foreign key back in it’s “native” table.
Two step process in a CREATE TABLE statement:
Add an attribute specification for each foreign key attribute
Datatype must be consistent
Refer to E-R diagram to determine if a NOT NULL constraint is needed for the foreign key attribute(s)
Add a foreign key constraint for each foreign key
Syntax:
CONSTRAINT table_attr(s)_fk FOREIGN KEY (attr(s)) REFERENCES native_table(attr(s))
For 1:1 or 1:N (N:1) recursive relationships
Foreign key constraint will reference the same table’s primary key
For supertype/subtype relationships
Foreign key constraint in subtype will reference supertype
Transactions
SQL provides us with the concept of a “transaction”.
A transaction is a logical unit of work.
Can be composed of one or more statements
Statements can be
SQL
§ Data Definition Language (DDL)
§ Data Manipulation Language (DML)
§ Programming elements – if supported by DBMS
Processed by Transaction Control Language (TCL) statements
§ COMMIT = save
§ ROLLBACK = undo/cancel
COMMIT
The COMMIT statement defines the end of the transaction and writes the changes to the database
In MySQL, an implicit commit is performed when a DDL statement is executed with an in-process transaction
The ACID properties guarantee that either all the changes between the “BEGIN” and “COMMIT” statements get done, or they don’t get done at all
Transaction Control Language
The database is automatically restored to its prior state if there is a problem
No code or other intervention by programmer
Rollback
End/Cancel
Transaction Control Language
Properties of a Transaction
Atomic
Consistency
Isolation
Durability
Atomicity
Transaction is considered an atomic unit. A transaction is resolved by either all changes being made or none of the changes being made, a partial commit is not allowed.
If a failure occurs in the middle of a transaction, all partial results must be undone.
Consistency
A transaction must leave the database in a consistent state.
A transaction must not violate database integrity constraints
Isolation
Concurrent transactions will not negatively impact one another.
Record locking – can restrict access to records that are part of an in-process transaction.
An in-process transaction will not reveal results to others until it is committed or rolled-back
Durability
Committed data is permanently recorded.
If a failure occurs, DBMS, needs to be able to restore committed data.
Relational Algebra
Similar to sets
Some operations are similar to corresponding set operations
Can be represented by Venn diagrams
Relations can be manipulated using operations
Procedural
Query languages are based on relational algebra (SELECT statement)
Consists of variables – representing relations.
Operators – represent the actions we can take with these relations.
The output must meet the characteristics of a relation.
Theoretical Relational Algebra Operators
Three Classes:
Operations to remove parts of a relation:
Selection – only tuples that meet stated criteria appear in the resulting relation
Projection – only specified attributes appear in the resulting relation
Set based: Union, Intersection, Difference
Operations that join tuples in two relations:
Product Joins
Projection
The projection of a relation produces a new relation with selected columns from the original relation
Limits the attributes that will appear in the resulting relation to those that are specified
Can be used to rearrange the order of columns in the resulting relation
Duplicate tuples (rows) from result set are removed
Notation variations:
RELATION[attr1, attr2, attr4]
Select cause of a SELECT statement
Selection
Can be used to specify condition(s) that must be met in order for a tuple to appear in the result set
Condition can include: =, <, >, ≤, ≥, Negation: ¬, OR: ∨, AND:
Notation variations:
RELATION WHERE condition_to_be_met
Example: EMP WHERE dept# = 3
Where Cause of a select statement
SQL and Relational Algebra
When attempting a relational algebra using SQL, the DBMS may not produce a relation as duplicate rows can be in the result set.
Can use the DISTINCT key word as part of SELECT clause of SELECT statement to remove duplicate rows from a result set.
Not needed for every SELECT statement written!
Only those where you are asked to produce a relational algebra operation using SQL.
Format: SELECT DISTINCT .
Union
The union of two relations produces a resulting relation with the all of the tuples from each relation.
Relations must be Union Compatible.
Remember, the resulting relation is ALWAYS a relation and must meet the characterics of a relation (i.e. duplicate tuples are removed)
Notation variations:
§ A ∪ B
§ A + B
§ A UNION B
Is commutative
Union Compatible
Each relation has the same number of attributes
The attributes must be in the same order in the two relations, which means that corresponding attributes must come from the same physical and logical domain
Commutative
The order that the relations are specified will not impact the resulting relation
Note that not all relational algebra operations are commutative
Difference
Result is all of the tuples from the first relation that do not appear in the second relation.
Relations must be union compatible
Is NOT cummutative
§ Order matters
§ Notation:
A – B
A DIFFERENCE B
Subqueries
Combining Queries
You can have many nested subqueries
The subquery should be in parentheses
The subquery can be part of the WHERE clause of the parent query
The datatype being returned by the subquery must match the field referenced in the WHERE clause
Any projection of attributes to the user can only include attributes from the outermost query
Intersection
The intersection of two relations is a relation that contains only the tuples that appear in both relations.
Relations must be union compatible.
Is commutative
A INTERSECT B = B INTERSECT A
Notation variations:
§ A ∩ B
§ A INTERSECT B
Product
The set of tuples formed by combining each tuple in one relation with each of the tuples in another relation.
Also called:
§ Cartesian product
§ Cross product
Notation variations:
§ A x B
§ A PRODUCT B
§ By itself, not very useful, but is used as part of other operators
Algebra Joins
Equijoin
Natural join -Also includes a projection
Outer join (also known as a directional join)
§ Left outer join
§ Right outer join
§ Full outer join
Equijoin
In an equijoin, specified attributes are matched using an equality test. All attributes will appear in the resulting relation
selects tuples where the “ON”
attribute(s) has non-null values in both relations (matches only)
Natural Join
In a natural join, specified attributes are matched using an equality test.
Only one of the “joining” attributes appears in the final result
selects tuples where the “ON”
attribute(s) has non-null values in both relations (matches only)
Left Outer Join
LEFT OUTER JOIN selects tuples where the “ON” attribute(s) has non-null values in the first relation (left side)
Gets all tuples from the relation that appears on the left-hand side of the join and any matching rows from the relation on the right side.
In case there are no matching tuples from the right side, NULL will appear for “right side” attributes
Right Outer Join
Selects tuples where the “ON” attribute(s) has non-null values in the second relation (right side)
Right Outer Join: Gets all tuples from the relation on the right side of the statement and any matching tuples from the relation on the left side.
In case there are no matching tuples on the left side, attribute values for “left side” are
NULL.
Full Outer Join
FULL OUTER JOIN selects tuples where the “ON” attribute has non-null values in either relation
You see all tuples from both relations
§ Where the tuples match based on the join criteria, the values from both relations will appear.
§ NULL values will appear for attributes from either relation where the join conditions were not
met
Having Clause
§ Instead of using the WHERE clause, SQL provides us with the HAVING clause
§ Use the HAVING clause to place restrictions according to the results of aggregate functions
It’s not a case of one or the other – a WHERE clause and a HAVING clause can be in the same statement
§ The WHERE clause applies to the raw, ungrouped data and is evaluated before a GROUP BY clause executes
§ The HAVING clause applies to the grouped data and is applied after a GROUP BY clause executes
SELECT CLAUSE order and which are optional
SELECT attribute(s)
FROM tableName(s)
WHERE condition(s) <- optional
GROUP BY groupList <- optional
HAVING haveCondition <- optional
ORDER BY orderList; <- optional
Highly Abstract Tables
Highly abstract tables collapse common fields into a tag/value relationship
n Tempo, Mood, and Genre are all “attributes”
n Create another weak entity that has an “attribute” field and a value field
n Similar in concept to key/value pair
Highly Abstract Tables Advantages and Disadvantages
Benefit:
§ Very flexible
§ Easy to handle unforeseen information
Downside
§ Queries get difficult
Union compatible
both output compatible data types for each column, return the same number of columns,
and the order of the columns match up
Join combo
Product selection projection
Union compatible
Union, difference, intersection