Databases Flashcards
Entity-relationship model
Graphical description of a database, represented in UML notation. An entity type is a group of objects with the same properties and independent existence. An entity occurrence is a uniquely identifiable instance of an entity type.
Candidate and primary keys
A primary key is one which has been chosen to uniquely identify an entity occurrence, a candidate key is one that could.
Simple and composite keys
A candidate key that consists of one attribute
A candidate key that consists of many attributes
Advantages of EER model over ER
Avoids describing similar concepts more than once
Can have relations that include a subclass but not the superclass
More semantic information in the design (e.g. manager IS-A member of staff)
How does the EER model work?
A subclass is a subgrouping of occurrences of an entity type, which need to be represented separately. A superclass is an entity type that has two or more distinct subclasses. Example: Staff contains Manager and Secretary.
All attributes of the superclass are also attributes of the subclasses, and a subclass has additional attributes than its superclass.
Specialisation
Top-down process of maximising the differences between entity occurrences by identifying their distinguishing characteristics. Given superclasses, it leads to identifying subclasses.
Generalisation
Bottom-up process of minimising the differences between entity occurrences, by identifying their common characteristics.
Participation constraint
Determines whether every member in the superclass must participate as a member of one of the subclasses (either mandatory or optional)
Disjoint constraint
Determines whether a member of a superclass can be a member of one or more subclasses (or=disjoint=only one subclass, and=non-disjoint=multiple subclasses)
Structured data
This is data represented in a strict format, such as the relational data model. The database management system checks to ensure that the data follows the structures and referential constraints specified in the schema.
Semi-structured data
This is data that may be irregular or incomplete and may have a structure that changes rapidly/unpredictably. It may have some structure, but not all parts may have the same structure and each data object can have different attributes that are not known in advance. We end up with such data when we collect data ad-hoc. Information that usually belongs to the schema is now mixed in with the data itself
Advantages of XML
Simple standard, human-legible, extensible (can define own tags), platform-independent, separation of content and presentation (allows custom view of data), allows repeated attributes
Well-formed XML
Well-formed: single root element, matching tags (properly nested), initial XML declaration including XML version number, encoding, and standalone status (DTD or no DTD)
Type-valid XML
Well-formed and the elements must follow the pre-defined structure defined in the DTD.
Schema-valid XML document
Must be well-formed and conforms to an XML schema
DTD attribute syntax
<!ATTLIST BAR topic (sushi | sports) “sushi”>
This means that a BAR has an attribute “topic” which can either be “sushi” or “sports” but the default is sushi
XQuery syntax
FOR $variableName in <XPath>
WHERE <condition>
RETURN {<expression>}</expression></condition></XPath>
Write “doc(“bib.xml”)/bib/book[year > 1995]/title” in XQuery
FOR $x IN doc(“bib.xml”)/bib/book
WHERE $x/year > 1995
RETURN {$x/title}
XQuery LET statement
Binds collection variables, meaning it returns one value consisting of a list of values (e.g. LET $S := /STAFFLIST/STAFF)
XQuery for finding all books by the coauthors of the book “Database Theory”
FOR $x IN bib/book [title/text() = “Database Theory”] / author
$y IN bib/book [author/text() = $x] / title
RETURN <answer> { $y/text() } </answer>
Existential qualifier
WHERE SOME $p IN <collection> SATISFIES <condition>
WHERE EVERY $p IN <collection> SATISFIES <condition></condition></collection></condition></collection>
What is a transaction?
An action carried out by a single user/program which reads or updates the database
What is referential integrity
A property of a database stating that all its references are valid according to the EER diagram
What are the ACID properties of transactions?
Atomicity: a transaction is either performed entirely or not at all
Consistency: a transaction must transform the database from one consistent state to another
Isolation: Transactions execute independently (partial effects of incomplete transactions should not be visible to other transactions)
Durability: The effects of a committed transaction are permanently recorded
What is concurrency control?
The process of managing simultaneous operations on the DB without having them interfere with each other
What is the lost update problem?
An apparently successfully completed update operation by one user is overridden by another
What is the uncommitted dependency problem?
A transaction is allowed to see the intermediate results of another transaction before it has committed
What is the inconsistent analysis problem?
A transaction read some values while they are being updated by another transaction
What is a schedule?
A sequence of operations from a set of n concurrent transactions such that the order of the operations in each transaction Ti is preserved in the schedule. A serial schedule is one where the operations of any two transactions are not interleaved (they all happen one after the other)
How do we check whether a non-serial schedule is #conflict serialisable#?
Construct the precedence graph: a directed graph G = (N, E) with a set of nodes N and a set of directed edges E, with:
- A node for each transaction
- A directed edge from Ti to Tj whenever Tj reads a value of an item written by Ti, or Tj writes a value into an item after it has been read/written by Ti (Ti “pokes” Tj for reading something it wrote, or overwriting something it touched)
An edge from Ti to Tj means that in the equivalent serial schedule, Ti must appear before Tj
A schedule is serialisable if and only if there is no directed cycle in the precedence graph
What does it mean if two schedules are view equivalent?
For each data item x, if transaction Ti reads the initial value of x in S1, then Ti reads the initial value of x also in S2
For each data item x, if the last write operation on x in S1 was done by Ti, then T1 also performs the last write operation on x in S2
For a read operation on data item x by transaction Ti in S1, if the value of x read by Ti was written by transaction Tk, then Ti must also read the value of x produced by Tj in S2.
First read, last write, if Ti reads a value written by Tj somewhere then it does everywhere
View equivalent schedules return the same results
What does it mean if a schedule is view serialisable?
It’s view equivalent to a serial schedule, meaning that conflict serialisable implies view serialisable
Conservative/pessimistic methods for concurrency control
Actively avoid conflicts, delay or restart transactions when they are in conflict.
Optimistic methods for concurrency control
Assume that transactions are rarely in conflict, check for conflicts just before the transaction commits.
Shared lock vs exclusive lock
Shared lock = transaction is only allowed to read some data item (not write), and any other transaction can also only read this item (not write)
Exclusive lock = transaction is allowed to read and write the same data item, while any other transaction has no access
When does the DBMS grant a lock?
When either the item is not currently locked, or a shared lock is being requested on a share-locked item.
A transaction holds a lock until it explicitly releases it, either during its execution or when it terminates. The effects of a write operation are only made visible when the lock is released. Sometimes locks may be upgraded/downgraded.
What is two-phase locking?
A system where for every transaction, there is a growing phase (acquire all needed locks, no unlocks), and shrinking phase (release all locks, no new locks). If we use this protocol, then the schedule will always be conflict serialisable. Problems can still occur if locks are released early.
What is the cascading rollback problem?
Where a transaction T1 causes a failure and a rollback must be performed. Other transactions dependent on T1’s actions must also be rolled back.
Rigorous and strict two-phase locking
Rigorous: we release all locks at the end of every transaction (when it commits), allowing us to serialise the transactions in the order they commit
Strict: the DBMS releases all write locks at the end of every transaction, but read locks can be released earlier.
What is a deadlock?
A problem where two or more transactions wait for each other to give up locks
Dealing with deadlocks
Timeouts: requests for locks time out after a maximum period of time, and then the transaction rolls back and restarts
Choosing the victim: it’s better to abort a short transaction, one which has made few changes, and one which doesn’t have much data left to update. This can lead to starvation (one process always being chosen as the victim). The common solution is to store how many times each transaction has been aborted, and then use different selection criteria when an upper limit is reached.
Deadlock detection
Construct the wait-for graph G = (N, E), where the nodes N are the transactions and there is a directed edge from Ti to Tj when transaction Ti is waiting for a lock that is kept by transaction Tj (Ti pokes Tj to ask it to remove the lock)
A deadlock exists if and only if the wait-for graph contains a directed cycle. Once a deadlock is detected, the DBMS rolls back the transaction.
Recoverable schedules
A recoverable schedule is one that ensures that once a transaction T is committed, it should never be necessary to roll it back.
A schedule S is recoverable if no transaction T commits unless all the transactions that it reads commit first.
And if Tj reads Ti then the commit operation of Ti appears before the commit operation of Tj.
What does it mean when a transaction T “reads from” a transaction T’ in schedule S
Some data item x is first written by T’, and then read by T.
When is a schedule cascadeless?
If Tj reads Ti then the commit operation of Ti appears before the read operation of Tj. Cascadeless implies recoverable.