Databases Flashcards

1
Q

Entity-relationship model

A

Graphical description of a database, represented in UML notation. An entity type is a group of objects with the same properties and independent existence. An entity occurrence is a uniquely identifiable instance of an entity type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Candidate and primary keys

A

A primary key is one which has been chosen to uniquely identify an entity occurrence, a candidate key is one that could.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simple and composite keys

A

A candidate key that consists of one attribute
A candidate key that consists of many attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages of EER model over ER

A

Avoids describing similar concepts more than once
Can have relations that include a subclass but not the superclass
More semantic information in the design (e.g. manager IS-A member of staff)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the EER model work?

A

A subclass is a subgrouping of occurrences of an entity type, which need to be represented separately. A superclass is an entity type that has two or more distinct subclasses. Example: Staff contains Manager and Secretary.

All attributes of the superclass are also attributes of the subclasses, and a subclass has additional attributes than its superclass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Specialisation

A

Top-down process of maximising the differences between entity occurrences by identifying their distinguishing characteristics. Given superclasses, it leads to identifying subclasses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Generalisation

A

Bottom-up process of minimising the differences between entity occurrences, by identifying their common characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Participation constraint

A

Determines whether every member in the superclass must participate as a member of one of the subclasses (either mandatory or optional)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disjoint constraint

A

Determines whether a member of a superclass can be a member of one or more subclasses (or=disjoint=only one subclass, and=non-disjoint=multiple subclasses)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Structured data

A

This is data represented in a strict format, such as the relational data model. The database management system checks to ensure that the data follows the structures and referential constraints specified in the schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Semi-structured data

A

This is data that may be irregular or incomplete and may have a structure that changes rapidly/unpredictably. It may have some structure, but not all parts may have the same structure and each data object can have different attributes that are not known in advance. We end up with such data when we collect data ad-hoc. Information that usually belongs to the schema is now mixed in with the data itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of XML

A

Simple standard, human-legible, extensible (can define own tags), platform-independent, separation of content and presentation (allows custom view of data), allows repeated attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Well-formed XML

A

Well-formed: single root element, matching tags (properly nested), initial XML declaration including XML version number, encoding, and standalone status (DTD or no DTD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Type-valid XML

A

Well-formed and the elements must follow the pre-defined structure defined in the DTD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Schema-valid XML document

A

Must be well-formed and conforms to an XML schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DTD attribute syntax

A

<!ATTLIST BAR topic (sushi | sports) “sushi”>

This means that a BAR has an attribute “topic” which can either be “sushi” or “sports” but the default is sushi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

XQuery syntax

A

FOR $variableName in <XPath>
WHERE <condition>
RETURN {<expression>}</expression></condition></XPath>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Write “doc(“bib.xml”)/bib/book[year > 1995]/title” in XQuery

A

FOR $x IN doc(“bib.xml”)/bib/book
WHERE $x/year > 1995
RETURN {$x/title}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

XQuery LET statement

A

Binds collection variables, meaning it returns one value consisting of a list of values (e.g. LET $S := /STAFFLIST/STAFF)

20
Q

XQuery for finding all books by the coauthors of the book “Database Theory”

A

FOR $x IN bib/book [title/text() = “Database Theory”] / author
$y IN bib/book [author/text() = $x] / title
RETURN <answer> { $y/text() } </answer>

21
Q

Existential qualifier

A

WHERE SOME $p IN <collection> SATISFIES <condition>
WHERE EVERY $p IN <collection> SATISFIES <condition></condition></collection></condition></collection>

22
Q

What is a transaction?

A

An action carried out by a single user/program which reads or updates the database

23
Q

What is referential integrity

A

A property of a database stating that all its references are valid according to the EER diagram

24
Q

What are the ACID properties of transactions?

A

Atomicity: a transaction is either performed entirely or not at all
Consistency: a transaction must transform the database from one consistent state to another
Isolation: Transactions execute independently (partial effects of incomplete transactions should not be visible to other transactions)
Durability: The effects of a committed transaction are permanently recorded

25
Q

What is concurrency control?

A

The process of managing simultaneous operations on the DB without having them interfere with each other

26
Q

What is the lost update problem?

A

An apparently successfully completed update operation by one user is overridden by another

27
Q

What is the uncommitted dependency problem?

A

A transaction is allowed to see the intermediate results of another transaction before it has committed

28
Q

What is the inconsistent analysis problem?

A

A transaction read some values while they are being updated by another transaction

29
Q

What is a schedule?

A

A sequence of operations from a set of n concurrent transactions such that the order of the operations in each transaction Ti is preserved in the schedule. A serial schedule is one where the operations of any two transactions are not interleaved (they all happen one after the other)

30
Q

How do we check whether a non-serial schedule is #conflict serialisable#?

A

Construct the precedence graph: a directed graph G = (N, E) with a set of nodes N and a set of directed edges E, with:

  • A node for each transaction
  • A directed edge from Ti to Tj whenever Tj reads a value of an item written by Ti, or Tj writes a value into an item after it has been read/written by Ti (Ti “pokes” Tj for reading something it wrote, or overwriting something it touched)

An edge from Ti to Tj means that in the equivalent serial schedule, Ti must appear before Tj

A schedule is serialisable if and only if there is no directed cycle in the precedence graph

31
Q

What does it mean if two schedules are view equivalent?

A

For each data item x, if transaction Ti reads the initial value of x in S1, then Ti reads the initial value of x also in S2

For each data item x, if the last write operation on x in S1 was done by Ti, then T1 also performs the last write operation on x in S2

For a read operation on data item x by transaction Ti in S1, if the value of x read by Ti was written by transaction Tk, then Ti must also read the value of x produced by Tj in S2.

First read, last write, if Ti reads a value written by Tj somewhere then it does everywhere

View equivalent schedules return the same results

32
Q

What does it mean if a schedule is view serialisable?

A

It’s view equivalent to a serial schedule, meaning that conflict serialisable implies view serialisable

33
Q

Conservative/pessimistic methods for concurrency control

A

Actively avoid conflicts, delay or restart transactions when they are in conflict.

34
Q

Optimistic methods for concurrency control

A

Assume that transactions are rarely in conflict, check for conflicts just before the transaction commits.

35
Q

Shared lock vs exclusive lock

A

Shared lock = transaction is only allowed to read some data item (not write), and any other transaction can also only read this item (not write)

Exclusive lock = transaction is allowed to read and write the same data item, while any other transaction has no access

36
Q

When does the DBMS grant a lock?

A

When either the item is not currently locked, or a shared lock is being requested on a share-locked item.

A transaction holds a lock until it explicitly releases it, either during its execution or when it terminates. The effects of a write operation are only made visible when the lock is released. Sometimes locks may be upgraded/downgraded.

37
Q

What is two-phase locking?

A

A system where for every transaction, there is a growing phase (acquire all needed locks, no unlocks), and shrinking phase (release all locks, no new locks). If we use this protocol, then the schedule will always be conflict serialisable. Problems can still occur if locks are released early.

38
Q

What is the cascading rollback problem?

A

Where a transaction T1 causes a failure and a rollback must be performed. Other transactions dependent on T1’s actions must also be rolled back.

39
Q

Rigorous and strict two-phase locking

A

Rigorous: we release all locks at the end of every transaction (when it commits), allowing us to serialise the transactions in the order they commit

Strict: the DBMS releases all write locks at the end of every transaction, but read locks can be released earlier.

40
Q

What is a deadlock?

A

A problem where two or more transactions wait for each other to give up locks

41
Q

Dealing with deadlocks

A

Timeouts: requests for locks time out after a maximum period of time, and then the transaction rolls back and restarts

Choosing the victim: it’s better to abort a short transaction, one which has made few changes, and one which doesn’t have much data left to update. This can lead to starvation (one process always being chosen as the victim). The common solution is to store how many times each transaction has been aborted, and then use different selection criteria when an upper limit is reached.

42
Q

Deadlock detection

A

Construct the wait-for graph G = (N, E), where the nodes N are the transactions and there is a directed edge from Ti to Tj when transaction Ti is waiting for a lock that is kept by transaction Tj (Ti pokes Tj to ask it to remove the lock)
A deadlock exists if and only if the wait-for graph contains a directed cycle. Once a deadlock is detected, the DBMS rolls back the transaction.

43
Q

Recoverable schedules

A

A recoverable schedule is one that ensures that once a transaction T is committed, it should never be necessary to roll it back.
A schedule S is recoverable if no transaction T commits unless all the transactions that it reads commit first.
And if Tj reads Ti then the commit operation of Ti appears before the commit operation of Tj.

44
Q

What does it mean when a transaction T “reads from” a transaction T’ in schedule S

A

Some data item x is first written by T’, and then read by T.

45
Q

When is a schedule cascadeless?

A

If Tj reads Ti then the commit operation of Ti appears before the read operation of Tj. Cascadeless implies recoverable.

46
Q
A