Exam 2 Flashcards
Will the Database Application Develop Methodology work for general application development?
No
Database Application Development Methodology - Assumptions that must hold
- Business processes are well-designed
- Documents (input to our output from apps on the db) are known
- Tasks (the processing of those documents) are known
- System boundary is known
- One database schema unifying all views can be designed
Troubles with creating one database schema to unify views?
Can be difficult because: Interests, goals, power, politics
Question: is that a problem with…
Problems with methodology?
Problems with Organization?
Answer: Organization.
Because Oraganization = an entity created to pursue a shared set of goals
Typical waterfall software development steps?
- Business Process design
- Analysis
- Specification
- Design
- Implementation
- Testing
- Operation
- Maintenance
Software dev steps that are unique to database methodology?
<those>bold></those>
- Business Process design
- Analysis
- Specification
- Design
- Implementation
- Testing
- Operation
- Maintenance
In database application method, which is first?
Data is First
In contrast, other software dev, process often first
What are the steps of databse application development?

Information Flow Diagram (IFD) shapes
- Elipses = Tasks
- Rectangles = documents (input and output screens)
- Arrows = information flow (NOT CONTROL FLOW)
- Broken line = system boundary
Never connect 2 tasks
Never connect 2 documents
What goes into the database?
What comes out of the database?
- Everything in the database must come from somewhere
- Everything on the input documents must go somewhere
- Everything in the database must be used for something
- Everythingo n the output documents must come from somewhere
Lecture 17 - Lesson 3
Constraints - how to include them?
There are constraints from the data format.
There are constraints from the EER diagram.
Any other constraints are those that have to be coded into the program.
Data formats - How to get?
Beg, borrow and steal from existing sources.
Rules for Task Decomposition
- Lookup very different vs. insert, delete and update (why? because different “locks” for lookups)
- If task has many different of these, should likely be decomposed
- How many schema constructs are involved (requires many locks)
- Would want to decompose task when many schemas
- Are enabling conditions consistent across tasks?
- (easier to run now, when there are resources, so run what you can)
- Are frequencies consistent across tasks?
- (if some have high and some have low frequencies, split them)
- Reason is because you might want to index high frequency things
- Is consistency essential?
- (Depends on task, money transfer yes)
- (ACID transaction properties?)
- Is mother task control needed or not?
- Sometimes you do, others you don’t
Web-based apps features
- Traditionally, almost stateless
- Must have some state, like user email
- May need some click stream history
- Advent of things like AJAX allow more state which allows web-based apps to act more like traditional apps
Traditional Apps (as opposed to web-based)
- Much easier to manage local state separate from the DB
- Lots of changes can be collected before submitting them to the DB
- Means you don’t need a round trip to the DB everytime a minor thing is changed
- Supports better control of ACID transactions execution
*
Formatting for abstract code
BoldUnderline: Task definition
Bold: Task call
Italics: Button names
Relational Model Terms
Row = Tuple
Column header = attribute
Table = relation
Data type = domain
Domain (definition)
Set of atomic values (i.e. indivisible)
Often specified by a data type
Domain Components
Comprise a name, datatype and format
Logical definitions = e.g. “set of valid 10-digit us phone numbers”
Datatype or format = (ddd) ddd-dddd where d = decimal
Unit types (optional) = kg
Relational Scheme
Name = relation name
Attributes are name of a domain in the schema (i.e. column)
Degree = number o attributes
Dom(attribute) = domain of the attribute
Relation
Set of rows
Each row has n-tuples
n = degree and number of attributes
Schema = intension
Actual rows = extension (i.e. state)
Domains can be same in attributes?
Yes (just data type).
But attributes play different *roles* in the relation
Difference between relation (mathematically defined) and the table or flat file (actual content)
- Order of tuples does not matter in relation, but exists in a table or flat file
- (therefore tables with different orders are considered identicle)
- Driven because relation = set of n-tuples
- Sets don’t have order in math
- Technically, n-tuple described as ordered list from math, but in reality, order of attributes shouldn’t matter
- Can re-define a tuple as a set to make it mathematically not care about order
- Makes two differently orderd tuples identicle
- Requires “dictionary” format (name:value) which is called *self-describing data*
-
Are tuple valus mult-valued or atomic?
Must be atomic
Called flat relational model
Assumption was a lot of relational theory
Assumption called first normal form
Null value types
Value unknown (i.e. not sure if exists)
Value not available (i.e. know it exists, but don’t have it)
Attribute does not apply (value undefined)
Interpretation of a relation
Can be assertion that an entity or relationship has attributes.
- Each tuple then is a fact
- Can be facts about entities or relationships (hard to tell each)
Alternatively, can be a predicate. The values are values that satisfy the predicate. Useful for logical programming.
What is the closed world assumption?
Only true facts of the universe are those present in a extension (state).
Any other combincation would make a predicate false.
Useful for relational calculus.
Types of constraints
- Implicit constraints or inherent model-based constraints = Those inherent in the data model
- Schema-based constraints or explicit constraints = Those that can be expressed direclty in the schema (typically specified in the DDL)
- Application-based or semantic or business rules = Constraints that cannot be explicity in the schema or data model and must be enforced in the application
Types of schema- based constraints
- Domain constraints
- Key constraints
- Constraints on NULL
- Entity integrity
- Referential integrity
Domain Constraint
Specifies each attribute is atomic and of a specific domain
Domain could be a standard type or could be specified as a sub-set or even enumerated as explicit, possible values
Definition of relation
A set of tuples
Key constraint
That any given tuple is has a unique set of values.
This follows from fact that a relation is a set of tuples (and sets don’t have duplicates)
Superkey of a relational scheme
The subset of attributes in a tuple that would establish the uniqueness constraint
Every relation has one default superkey: the set of all its attributes
Key in a relational scheme
Like a superkey but if you remove any attribute, it is no longer a superkey (i.e. doesn’t have redundancy like a superkey).
Basically rules are:
- All its values are unique for a tuple
- Can’t remove any values and still have uniquness constraint hold
Is a key a superkey?
Yes. A key is a superkey but not vice versa.
Reason is a key removes all redundant attributes.
What if there is more than one key in a schema?
Both keys are candidate keys
Usually one key is designated the primary key
The other candidate keys are called unique keys
Constraints on attributes
Must be unique to a tuple
Must specify which values can or cannot be NULL
Key must be time invariant (i.e. as new rows added)
How do you translate the following from an EER diagram to a relational database:
- Entity
- Attribute
- Identifier/key
- Composite attribute

How does the relational model handle multi-valued attributes?

How do you map 2 entities (from an EER diagram) into the relational model?
You put the primary key for one entity as a column in the other entity.
You only choose one to do this to.
It doesn’t matter which UNLESS, there is a mandatory participation relationship (see other card)

How do you show a relationship with 2 entities in a relational database, when one has a mandatory participation relationship with the other?
This is better because it means every ET2 definitively has a relationship with an ET1, whereas if you had it the other way, that would not have to be the case.

How do you map a 1 to N relationship between entities in a relational model?
You have to map each instance of the many relationship to a given key of the 1 relationship.
The reason is because, if you did the other way, the 1 relationship would have a field that has to map to many instances of the N relationship and that doesn’t work (i.e. one column mapping to many primary keys).

How do you show a M to N relationship in a relational model?
You can’t simply add the primary key for one entity as a field to another entity’s table.
Instead, you have to create a new table (R, say) that has both the primary keys from each entity.
The primary key in R is a combo of the two primary keys.

How do you represent an identifying relationship betwen a strong entity ET1 and many a weak entity ET2, in a relational model?
ET2 can’t exist without an ET1 (per the ownership relationship).
So the ET2 table is actually identified by an ET1 key in the first column and ET2 keys in the second.
This is done for all the many ET2 instances.
Together, each of these form key for an instance in ET2.

How do you represent a disjoint relationship between an entity ET and its subclasses ET1 and ET2, if ET has a mandatory relationship to its subclasses?
Because the subclasses might have their own attributes that the superclass doesn’t have, we don’t do a table for ET.
We do a table for ET1 and another for ET2. Both have all the attributes of ET.
They also then have the unique attribute to their subclass.

What is “a foreign key onto ET”?
When there is a second table for ET1, say, with the same key as ET, we say the reference from ET1 is “a foreign key onto ET”.
We also draw an arrow from ET1 to ET.

How do we handle a mandatory overlapping relationship between a superclass and its subclass, in a relational model?
Recall that with an overlapping (assuming a mandatory participation), the superclass is also a member of one subclass, the other or both.
There are two options for representing this (second preferred).
First, you can have one table that includes a field (column) for each of the overlapping subclass entities. Then you include another column that specifies the “Type” of relationship (e.g. it has one, the other or both entities)
Second, you can create three tables: ET, ET1 and ET2, each with the same primary key + the attributes from the subclasses. It just means that the instance shows up in all tables, which Prof was OK with
Lesson 6: Lecture 9

How do we show NON-mandatory relationship between superclass and subclasses that have overlap?
Same as with mandatory relationship to overlapping subclasses.
Only difference is the key from ET might not have any instances in ET1 or ET2.

How do you representa NON-mandatory, disjoint relationship in a relatoinal model?
Similar to the non-mandatory for overlapping relationship.
As far as I can tell, though, you’d have to use business logic to ensure the instance doesn’t show up in either disjoint table.

How do you represent a union type in a relational model (with ET, ET1 and ET2)?
Create an “artificial identifier” in the relation ET, possibly comprising an element of ET1 or ET2.
Then have that identifier show up in ET1 or ET2, whichever one it is associated with, like any other relationship.
These will be foreign keys to the ET identifier in ET.

What is “Closed Algebra”
For every rational number you give you get a rational number back.
You get the same kind as the operands you started.
Helps you formulate “high level models” based on previous thoughts (i.e. previous calculations)
What are the relational algebra set operators?
Set operators:
R U S = union
R Π S = intersection
R \ S = set difference
R x S = Cartesian Product
π A1, A2, … An (R) = project (think of as eliminating columns)
σexpression (R) = selection (think of as eliminating rows)

What are the relational algebra join (or construct) operators?
Join Operators:
R * S = natural join
R [something] S = outer joins (left, right, full)
R [something] S = theta-join (similar to universal quantification in relational calculus)
R [something] S = divideby
[greek p] = rename (lets you somehow combine joins and set operators)

How does a selection operation work?

Select… all the tuples from relation R… that satisfy this expression
No expression selects all tuples (i.e. rows)
The usual symbols work to form expressions

How do composite expressions work in selections?
Similar to other math and code:
They can use AND and OR, parentheses or NOT.

How does projection work?

Projection returns the attributes listed for the given expression or relation.
Basically creates a “view”
NOTE: This is an example of “closed” algebra. It takes a “relation”, does “selection” on that relation in the parentheses, that yields a relation, which then we do projection on.
Every step is a relation so you can keep apply operations to them.

Relations are or are not sets?
Relations are sets!!
So if a view only shows a unique row once.

How does the union operator work?
Union puts the results of relation in one attribute or result (e.g. home town U current city => cities)
It is the two circles of a Venn + the overlap.
NOTE: The operands of the operator need to be type compatible (i.e. same number of attributes and pairwise same types)
For example, with a union of two projections, the projections have to have only one attribute each.

How does the Intersection operator work?
Intersection finds the just the overlapping items between two attributes and returns them as a relational attribute.
In the Venn diagram, it is just the overlap.

Conceptually, how does set difference work?
Basically includes all of one set MINUS the items of overlap from another set.

Mechanically, how does set difference work?
Basically takes attributes of a relation, keeps the main relation MINUS the items that are part of the other relation.
Returns a view of the items that remain.

How does a natural join operator work?
Combines the two tables (i.e. doesn’t just keep the attributes queried… it MATCHES attributes in the tuples but gets the whole tuples that matchs and combines them.
Only includes tuples where attributes of the join have matching names (i.e. the same values)
- For example, if the relations each of a year attribute, those are compared
- If they ALSO have a sex attribute, that too would be compared
Has all attributes of both relations BUT only keeps only one copy of the matched attribute
It is an “inner” join because only includes tuples that match.

What is a theta join operator?
Brings the attributes from both relations together in one view.
Does so by matching attributes based on a comparison expression (theta).
The attributes actually have different names (i.e. “BirthYear” vs. “EventYear”)
Also an “inner join” because it only includes tuples that match.

How does a (Left) Outer Join work?
Combines the attributes of both relations.
Will, in particular, combine attributes that have a match in attributes (of the same attribute name I THINK?).
This is the “inner” part.
It will still include tuples that didn’t have matching attributes and the matched attribute will be NULL
These are the “outer” part
It is a “NATURAL” join ( think because same named attribute).
Is a special case of the theta join

How does the cartesian product work?
Combines ALL tuples in the two relations.
For example (the image): user 4 in RU is combined with all users in UsserInterests

How is Cartesian Product userful?
See the example image.
It is basically creating a table that has all possible combination of user and interests and then subtracting out anything the user is already signed up to.
The result could then be used to email the user what’s remaining.

How does the divideby operator work?

The divideby basically takes the items on the left and then, if ALL values of an attribute on the right are part of a match, it grabs the OTHER attribute from there.
In other words, it is checking the 3 values that are the attribute for “Interest” and extracting any email + interest combo that has that.
Then is keeping only the email part of that resulting relation.

How does the rename operator work?

Can beused to control natural join, theta join and others.
Lets you create a relation that replaces the name of attributes.
See example: main point is the expression, comma separated, where REPLACEMENT precededs ORIGINAL

What is relational calculus?
Relational Algebra is procedural in nature (it’s operator based)… you take relations and join them, project them, do a selection. You say what to do step by step.
In calculus, you simply describe the result that you want.
Is relational calculus higher-level and more expressive than relational algebra?
With respect to data-retrieval horsepower, they are equivalent.
SQL is mostly based on tuple calculus.
Are queries in relational calculus procedural or declarative in nature?
They are declarative. Relational algebra is procedural.
How to construct a relational calculus predicate?
See image
Also, mention in lecture about a “bound” variable: Lesson 10, lecture 20 (near the end)

How do you write a selection in relational calculus?
In example, expression reads: find an r such that r is a member of RegularUser

How do you do a composite expression in relational calculus?
See example
Basically reads like regular logic

How do you do a projection in relational calculus?
See example

How does union work in relational calculus?
Example is creating an attribute s.City.
And basically saying for every tuple in Regular user, s.City should equla current city OR same for hometown.
Then returns that as a set.

How does intersection work in relational calculus?
In the example, the AND basically makes it so that only cities in both sides work.

How does set difference work in relational calculus?
See example

How does a natural join work in relational calculus?
See example
Interesting that the year are a comparison but the others are an assignment.

How does cartesian product work in relational calculus?
See image.
Basically they just do (s,r | USER AND INTERESTS).
SEEMS like (based on lecture 30) that the part of this that means ALL of Regular User and ALL of UserInterests are mapped, is actually the s, r part.

Tricky Cartesian Product in relational calclus
See image and look at lesson 10, lecture 30

How does divideby work in relational calculus?
see example and lecture 31
