All Flashcards
Get Smarter Yo
What is a database?
A shared collection of logically related data designed to meet needs of organisation
What is a system catalog (metadata)?
Provides description of data to enable program-data independence
What defines logically related data?
Comprises of entities, attributes, and relationships of an organisations information
What is a Database Management System?
Software system enabling users to define, create, maintain and control access to a database
What control services does a DBMS provide? (4 total)
A security system
An integrity system
A concurrency control system
A user-accessible catalog
What services does a DBMS provide? (10 total)
Data storage, retrieval and update User-accessible catalog Transaction support Concurrency control services Recovery services Authorisation services Support for Data Communication Integrity Services Services to promote data independence Utility services
What advantages does a DBMS provide? (14 total)
Control of data redundancy Data consistency More information from same amount of data Sharing data Improved data integrity Improved security Enforcement of standards Economy of scale Balance conflicting requirements Improved data accessibility and responsiveness Increased productivity Improved maintenance through data independence Increased concurrency Improved backup and recovery services
What disadvantages does a DBMS bring with it? (7 total)
Complexity Size Cost Additional hardware costs Cost of conversion Performance Higher impact of failure
What is a DB Application Program?
A computer program that interacts with a DB by issuing an appropriate request to the DBMS
What does a Data Definition Language do?
A DDL permits specification of data types, structures and any data constraints
What does a Data Manipulation Language do?
A DML is a general enquiry facility, a query language, of the data, providing basic manipulation operations on data in the DB
What defines a procedural DML do?
Allows users to tell system exactly how to manipulate the data
What defines a Non-Procedural DML?
User states what data is needed, not how to retrieve it
What is a data model?
An integrated collection of concepts for describing data, relationships between data, and constraints on the data in an organisation
What components make up a data model? (3 total)
Structural part
Manipulative part
Possibly a set of integrity rules
What is the purpose of a data model?
Provides a way to represent data in an understandable way
What categories of data model are there? (3 total)
Object Based
Record Based
Physical
What is a system catalog? ( 2 points)
A repository of information, or metadata, describing the data in the DB and a fundamental component of any DBMS
What is middleware?
Software that mediates with other software and allows for communication between disparate applications in a heterogeneous system
When is middleware needed?
When distributed systems become too complex to manage efficiently without a common interface
What is a Web Service?
A software system designed to support interoperable machine-to-web service machine interaction over a network
What does XML stand for?
Extensible markup language
What does SOAP stand for?
Simple object access protocol
What does SOAP do?
Platform and language independent communication protocol for exchanging structured information over the internet and based on XML.
What does WSDL stand for?
Web services description language
What does WSDL do?
Protocol based on XML and used to describe and locate web services
What does UDDI stand for?
Universal discovery, description and integration
What does UDDI do?
Platform independent, XML based registry protocol for businesses to list themselves on the internet
What is a distributed database?
A logically interrelated collection of shared data, physically distributed over a computer network
What is the cloud computing?
A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.
What is a transaction?
An action or series of actions carried out by a user or applications, which reads or updates DB contents, forming a logical unit of work on the DB
What do each of the ACID properties stand for, and what do they mean?
Atomicity
Transactions treated as single unit
Consistency
Transactions only bring DB from one valid state to another
Isolation
Ensures concurrent execution of trans leaves DB in same state as if they were executed sequentially
Main goal of concurrency control
Durability
Guarantees transaction remains committed, even after system failure
What are the objectives of a Three-Level Architecture? (6 total)
- All users should be able to access the same data
- A users view is immune to changes made in other views
- Users should not need to know physical DB storage details
- DBA should be able to change DB storage structures without affecting users’ views
- Internal structure of DB should be unaffected by changes to physical aspects of storage
- DBA should be able to change conceptual structure of DB without affecting users’
What is the external level and what does it describe?
- Users view of the DB
* Describes part of the DB that is relevant to a particular user
What is the conceptual level and what does it describe?
- Community view of the DB
* Describes what data is stored in DB and relationships among data
What is the internal level and what does it describe?
- Physical representation of the DB on a computer
* Describes how the data is stored in the DB
What does logical data independence provide? (3 total)
- Immunity of external schemas to changes in conceptual schema
- Conceptual schema changes
- Shouldn’t require changes to external schema or rewrites of application programs
What does physical data independence provide? (3 total)
- Immunity of conceptual schema to changes in internal schema
- Internal schema changes
- Shouldn’t require changes to conceptual or external schemas
What defines an object based data model? (4 total)
- Entity relationship
- Schematic
- Functional
- Object-Oriented
What types of record based data model exist? (3 total)
- Relational Data Model
- Network Data Model
- Hierarchy Data Model
What information does a system catalog store? (5 total)
- Names, types and sizes of data items
- Constraints on the data
- Names of authorised users
- Data items accessible by a user and the type of access
- Usage statistics
In a multi-user DBMS architecture, what defines teleprocessing? (3 total)
- One computer with a single CPU and a number of terminals
- Processing performed within the same physical computer
- Terminals are ‘dumb’, incapable of functioning on their own, cabled to central computer
In a multi-user DBMS architecture, what defines a file-server architecture? (4 total)
- Processing distributed about the network, typically a local area network
- File server connected to several workstations across a network
- DB resides on the file-server
- DBMS and applications run on each workstation
What are the disadvantages of a file-server mulit-user DBMS architecture? (3 total)
o Significant network traffic
o Copy of DBMS on each workstation
o Concurrency, recovery, and integrity control more complex
What defines each of the tiers in a two-tier client server architecture?
- Tier 1: Client manages UI and runs applications
* Tier 2: Server holds DB and DBMS
What are the advantages of a two-tier client server? (5 total)
o Wider access to existing DBs o Increased performance o Possible reduction in hardware costs o Reduction in communication costs o Increased consistency
What makes a three-tier client server better than a two-tier server DBMS architecture?
Much more scalable compared to the two-tier client server system
What are the advantages of a three-tier client server? (5 total)
o ‘Thin’ client requiring less expensive hardware
o Application maintenance centralised
o Easier to modify or replace one tier without affecting others
o Separating business logic from DB functions makes it easier to implement load balancing
o Maps quite naturally to Web environment
What defines an n-tier architecture?
Application servers host API to expose business logic and business processes for use by other applications
What do you gain from extending a three-tier architecture to n-tier?
Additional tiers provide more flexibility and scalability
What defines a web service and service-oriented architecture? (3 total)
- Shares business logic, data, and processes through a programmatic interface across a network
- Developers can add web service to a web page, or to an executable program, to offer specific functionality to users
- Use technologies such as: XML, SOAP, WSDL, UDDI
What three factors can describe a distributed database in greater detail? (3 total)
- Software system that permits the management of the distributed DB and makes the distribution transparent to users
- Consists of a single logical DB split into a number of fragments, each stored on one or more computers under the control of separate DBMS, with the computers connected by a network
- Each site capable of independently processing user requests that require access to local data and is also capable of processing data stored on other computers in the network
In cloud computing, what is an on-demand self-service?
Consumers can obtain, configure an deploy cloud services without help from the provider
In cloud computing, what is a broad network access service?
Accessible from anywhere, from any standardised platform
What subclasses of cloud computing SaaS’s exist and how do they differ?
DaaS and DBaaS, differing only in data management
What defines a DBaaS? (3 total)
- Offers full DB functionality to application developers
- Management layer that provides continuous monitoring and configuration of the DB
- Spares developers from the ongoing DB administration tasks
What are the advantages of a DBaaS? (4 total)
o Optimised scaling
o High availability
o Multi-tenancy
o Effective resource allocation in the cloud
What defines a DaaS? (3 total)
- Services enable data definition in the cloud and subsequent querying
- Doesn’t implement typical DBMS interfaces but instead data is accessed via common APIs
- Enables organisation with valuable data to offer access to others
What components make up a DBMS and what are their functions? (14 total)
Query processor
• Transforms queries into a series of low-level instructions directed to the DB manager
DB Manager
• Interfaces with user-submitted application programs and queries
• The DM examines external and conceptual schemas to determine conceptual records to satisfy the request
• DM then paces call to file manager to perform request
File Manager
• Manipulates underlying storage files and manages allocation of storage space on disk
• Establishes and maintains list of structures and indexes defined in the internal schema
DML Pre-processor
• Converts DML statements into standard function calls in host language.
• Must interact with query processor to generate appropriate code
DDL Compiler
• Converts DDL statements into a set of tables containing metadata
• Tables then stored in system catalog while control information is stored in data file headers
Catalog Manager
• Manages access to and maintains the system catalog
Authorisation Control
• Confirms whether the user has the necessary permission to carry out the required operation
Command Processor
• Control passed to this component once user authority is confirmed
Integrity Checker
• Ensures requested operation satisfies all necessary integrity constraints for an operation that changes the DB
Query optimiser
• Determines optimal strategy for query execution
Transaction manager
• Performs required processing of operations received from transactions.
Scheduler
• Ensures concurrent operations on DB proceed without conflicting with one another
• It controls the relative order in which transaction operations are executed
Recovery manager
• Ensures the DB remains in a consistent state in the presence of failures
• It is responsible for transaction commit and abort.
Buffer manager
• Responsible for the transfer of data between main memory and secondary storage, such as disk and tape.
What does a transaction do?
Transforms DB from one consistent state to another, although consistency may be violated during a transaction
What defines a successful transaction?
Transaction is committed to the DB, reaching a new consistent state
What defines a transaction failure?
Transaction aborts, and the DB must be restored to a consistent state before it is started via roll-back
Can a transaction be aborted?
No
Can an aborted transaction be restarted?
Yes, provided it is rolled back
What is concurrency control in the context of transaction management? (2 total)
• Process of managing simultaneous operations on the DB without having them interfere with one another• • Prevents interference when two or more users are accessing DB simultaneously and at least one is updating data
What is the objective of concurrency control?
Objective is to schedule transactions to avoid interference
What limitation does the serial running of transactions bring with it?
Limits degree of concurrency or parallelism in system
What does the lost update problem refer to?
Successfully completed update is overwritten by another user
What does the uncommitted dependency problem refer to?
Occurs when one transaction can see intermediate results of another transaction before it is committed
What does the inconsistent analysis problem refer to?
Occurs when transaction reads several values but second transaction updates some of them during execution of first
What does serialisability do?
Identifies those executions of transactions guaranteed to ensure consistency and aims to find non-serial schedules that allow transactions to execute concurrently without interfering with one another
What is a schedule?
Sequence of reads/writes by set of concurrent transactions
What is a serial schedule?
Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions
What is the main issue with a serial schedule?
No guarantee that results of all serial executions of a given set of transactions will be identical
What is a non-serial schedule?
Operations from set of concurrent transactions are interleaved
What defines a serialised schedule?
A schedule is serialisable if it is non-serial in nature and identical to some serial schedule
What are the three rules to consider when identifying potentially conflicting transactions?
- If two transactions only read a data item, they do not conflict and order is not important
- If two transactions either read or write separate data items, they do not conflict and order is none important
- If one transaction writes a data item and the other reads or writes, order of execution is important
What does a conflict serialisable schedule do?
Orders any conflicting operations in same way as some serial execution
How do we test for serialisability? (workflow, 4 stages)
Create a precedence graph as follows:
- Create a node for each transaction
- Add a directed edge from Ti to Tj if Tj reads the value of an item written by Ti
- Add a directed edge from Ti to Tj if Tj writes the value into an item after it has been read by Ti
If the precedence graph contains a cycle, then the schedule is not conflict serialisable
What defines view serialisability and how does it differ from conflict serialisability? (4 total)
- Schedule is view serialisable if it is view equivalent to a serial schedule
- It is a less stringent definition of schedule equivalence than conflict serialisability
- Every conflict serialisable schedule is view serialisable, although the converse is not true
- Any view serialisable schedule that is not conflict serialisable contains one or more blind writes
What algorithmic time complexity class does testing the serialisability of a schedule fall under?
Testing if a schedule is serialisable is NPC
When are two schedules view equivalent? (3 total)
- For each data item x, if Ti reads initial value of x in S1, Ti must also read initial value of x in S2.
- For each read on x by Ti in S1,if value read by Ti is written by Tj, Ti must also read value of x produced by Tj in S2.
- For each data item x, if last write on x performed by Ti in S1, same transaction must perform final write on x in S2.
How might we examine the recoverability of transactions within a schedule? (2 total)
- If transactions fail, atomicity requires effects of transaction to be rolled back
- Durability states that once a transaction commits, its changes cannot be undone without running another compensating transaction
What defines a recoverability schedule?
A schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj
What is the conservative approach to concurrency control?
Delay transactions in case they conflict with other transactions
What is the optimistic approach to concurrency control?
Assume a conflict is rare and only check for conflicts at commit
What is locking in the context of concurrency control?
Locks are used to deny access of other transactions, preventing incorrect updates. Common approach
What are the two forms of lock in concurrency control, and what are their actions?
Shared lock, used for reading data items
Exclusive lock, used for writing data items
What are the 5 rules of locking in concurrency control?
- If a transaction has a shared lock on item, it can read but not update said item
- If transaction has exclusive lock on an item, it can both read and write said item
- Reads cannot conflict, so more than one transaction can hold shared locks simultaneously
- Exclusive locks gives transaction exclusive access to an item
- Some systems allow transactions to upgrade read locks to exclusive locks, or downgrade exclusive to shared locks
Whats the main issue with an incorrect locking schedule and what effect does this have on ACID principles?
Transactions release locks too soon
Results in the loss of total isolation and atomicity
How can we guarantee serialisability whilst locking?
Through the addition of a protocol concerning the positioning of lock and unlock operations in every transaction
What is two-phase locking (2PL) and what are each of the phases?
Transaction follows 2PL if all locking operations precede first unlock operation in the transaction
Growing phase: acquires all locks but cannot release any
Shrinking Phase: releases locks but cannot acquire any new locks
What can be said about a schedule following 2PL?
The schedule is serialisable
What issues can arise through the use of 2PL?
Problems can occur with interpretation of when locks can be released.
What is Cascading Rollback? (2 total)
Occurs when transaction aborts with other transactions dependent on said transaction
Results in initial transaction and all subsequent dependent ones rolling back, cascading
How can cascade rollback be prevented?
2PL, leaving the release of all locks till the end of the transaction
How can 2PL be used to implement concurrency control with index structures?
- Could treat each page of index as a data item and apply 2PL
- As indexes frequently accesses, particular at higher levels, this may lead to lock contention
Explain the key points on index traversal in the context of concurrency control with index Structures. (3 total)
- Search path starts from root and moves down to leaves but search never moves back up tree. Thus, once a lower‐level node has been accessed, higher‐level nodes in that path will not be used again.
- When new index value is inserted into a leaf, if node is not full, insertion will not cause changes to higher‐level nodes.
- Only have to exclusively lock leaf node in such a case, and only exclusively lock higher‐ level nodes if node is full and has to be split
Explain the concurrency lock strategy for searches. (2 total)
o Obtain shared locks on nodes starting at root and proceeding downwards along required path.
o Release lock on node once lock has been obtained on the child node.
Explain the concurrency lock strategy for insertions. (1 total)
Conservative approach would be to obtain exclusive locks on all nodes as we descend tree to the leaf node to be modified.
Explain the optimistic approach lock strategy for concurrency control. (3 total)
o Obtain shared locks on all nodes as we descend to leaf node to be modified, where obtain exclusive lock.
o If leaf node has to split, upgrade shared lock on parent to exclusive lock.
o If this node also has to split, continue to upgrade locks at next higher level.
What is a deadlock and how do they arise?
An impasse that may result when two or more transactions are each waiting for locks held by the other to be released.
How do we break a deadlock?
Abort one or more of the transactions
Why should a deadlock be transparent to a user?
So that the DBMS can restart transactions
Why can’t a DBMS restart an aborted transaction in practice? When would it be possible to do this?
Because it is unaware of transaction logic, even if it was aware of the transaction history.
Possible if there is no user input or the input is not a function of the DB’s state
What are the three methods for handling deadlocks?
Timeouts
Deadlock Prevention
Deadlock Detection and Recovery
Describe the action of a deadlock timeout. (3 total)
- Transaction requesting lock will only wait for a system-defined period of time
- If lock is not granted within this period, lock request times out
- DBMS assumes transaction deadlocked and it aborts, automatically restarting system
Describe the action of deadlock prevention. (2 total)
- DBMS looks ahead to see if deadlocks will occur and never allows it to occur
- Transactions ordered using transaction timestamps: wait-die or wound-wait
What are the two methods of ordering transactions using timestamps in deadlock prevention? Describe their actions.
Wait-Die:
Only an older transaction can wait for younger one
Otherwise transaction is aborted and restarted with same timestamp
Wound-Wait:
Only a younger transaction can wait for an older one
If older transaction requests lock held by younger one, younger aborted
What defines deadlock detection and recovery?
DBMS allows deadlock to occur but recognises it and breaks it
What is the purpose of a wait-for graph and how is one constructed? (2 steps)
A wait-for graph is used to show transaction dependencies with a deadlock existing iff the wait-for graph contains a cycle
- Create a node for each transaction
- Create an edge from Ti to Tj if Tj waiting to lock an item locked by Tj
What three considerations must recovery from deadlock detection consider?
o The choice in deadlock victim
o How far to roll a transaction back
o Avoiding starvation
What is a timestamp?
A unique identifier created by a DBMS that indicates relative starting time of a transaction
What are the two methods for timestamp generation?
o Using system clock at time transaction started
o Incrementing a logical counter every time a new transaction starts
How are transactions ordered in timestamping and what effect does this have in the context of conflicts?
Transactions are ordered globally so that older transactions get priority in the event of conflict.
Conflicts are resolved by rolling back and restarting the transaction
Are deadlocks possible during timestamping?
No, no locks are implemented during timestamping
How are read and writes organised during timestamping?
- Read/write proceeded only if last update on data item was carried out by an older transaction
- Otherwise, transaction requesting read/write is restarted and given a new timestamp
What is a read-timestamp?
Timestamp of last transaction to read a data item
What is a write-timestamp?
Timestamp of last transaction to write a data item
What benefit does multi-version timestamp ordering offer over conventional timestamping methodologies? (8 total)
- Versioning implemented to increase concurrency
- Basic timestamp ordering protocol assumes only one version of data item exists
- Basic timestamping only one transaction can access data item at a time
- Multi-version can allow multiple transactions to read and write different versions of the same data item
- Ensures each transaction sees consistent set of versions for all data items it accesses
- Each write operation creates a new version of the data item whilst retaining old version
- System selects the correct version of data item when read request arrives
- Versions can be deleted when they are no longer required
What assumption are optimistic techniques based upon?
Based on the assumption that conflict is rare and more efficient to let transactions proceed without delays than ensure serialisability
What happens at transaction commit during optimistic techniques? (2 total)
A check is made to determine whether conflict has occurred or not
If there is a conflict, transaction must be rolled back and restarted
What key benefit is gained from using optimistic techniques in the context of transaction management?
Potential for greater concurrency than traditional protocols
What are the three phases of optimistic techniques and what are each of their actions? One phase has two variants
Read phase
• Extends from start until immediately before commit
• Transaction reads values from database and stores them in local variables
• Updates applied to a local copy of the data
Validation phase
- Read only transaction:
• Checks that data read still holds current values
• If no interference has occurred, transaction committed, else it is aborted and restarted
- Update transaction:
• Checks transaction leaves DB in a consistent state with serialisability maintained
Write Phase
• Updates made to local copy are applied to the DB.
How does granularity effect a system? (3 total)
The coarser the data item, the lower the degree of concurrency achievable
The finer the data item, the more reliant the system becomes on locking
Optimal size therefore determined by the type of transaction
Describe the action of a granularity hierarchy and explain how a DBMS comes into play. (4 total)
- Granularity of locks can be represented in a hierarchal structure
- Root node represents the entire DB
- When a node is locked, all of its children are also locked
- DBMS should check hierarchal path before granting lock
How is a Database restored?
Through the process of database recovery, restoring a DB to a correct state in the event of failure
What two points explain the need for recovery control?
- Volatile storage does not survive system crashes
- Stable storage represents information that has been replicated in several non-volatile storage media with independent failure modes
List the types of failure than can lead to a need for database recovery (6 total)
- System crashes resulting in the loss of main memory
- Media failures, resulting in the loss of parts of secondary storage
- Application software errors
- Natural physical disasters
- Carelessness or unintentional destruction of data or facilities
- Sabotage
Describe the purpose and action of the recovery manager during failure? (3 total)
- Recovery manager responsible for atomicity and durability
- Failure occurring between commit and DB buffers being flushed to secondary storage requires recovery manager to redo transactions to ensure durability
- Transaction not being committed at failure time, recovery manager has to undo any effects of the transaction for atomicity
What is the difference between a partial and global undo?
Partial undo’s result in a single transaction being undone, whereas global undos result in all transactions being undone.
List the 4 main recovery facilities and their actions.
Backup mechanism: makes periodic backup copies of the DB
Logging facilities: keep track of current state of transactions and DB changes
Checkpoint facility: enables updates to DB in progress to be made permanent
Recovery manager: allows DBMS to restore DB to consistent state following failure
Describe a log file (4 total)
- Contains information regarding all updates to a DB
- Transaction record contains transaction identifier, type of log record, log-management information etc…
- Log files may be duplexed or triplexed
- Sometimes split into two separate random-access files
Whats the main issue that can arise through the use of a log file?
Potential bottleneck arises which is critical in determining overall performance
What is a checkpoint, what does it record and how is it used at the point of failure?
- A checkpoint is a point of synchronisation between DB and log file, with all buffers forcibly written to disk
- Checkpoint record is created containing identifiers of all active transactions
- When failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash
What recovery technique is used if the DB is damaged?
Restore last backup of DB and reapply updates of committed transactions using log file
What recovery technique is used if the DB is only inconsistent? (3 total)
- Need to undo changes that caused inconsistency
- May also need to redo some transactions to ensure updates reach secondary storage
- Do not need backup, but can restore DB using before and after images in the log file
What is meant by the term deferred update in the context of a recovery technique? (3 total)
- Updates not written to the DB until after a transaction has reached its commit point
- If transaction fails before commit, it will not have modified DB and so no undoing of changes are required
- May be necessary to redo updates of committed transaction as their effect may not have reached DB
What is meant by the term immediate update in the context of a recovery technique? (4 total)
- Updates applied to DB as they occur
- Need to redo updates of committed transactions following a failure
- May need to undo effects of transactions that had not committed at time of failure
- If no transaction commit record in log, transaction was active at failure and undone
- Undo operations are performed in reverse order in which they were written to the log
What is the write-ahead log protocol related to and what does it state?
Relates to immediate update recovery technique and states that it is essential that log records are written before write to DB
Describe how shadow paging works. (5 total)
- Two page tables during life of a transaction: the current page and the shadow page tables
- Both are the same at the start of a transaction
- Shadow page never changed thereafter, used to restore DB in event of failure
- During transaction, current page table records all updates to the DB
- When transaction completes, current page table becomes the shadow page table
What are standard transaction models good for? (5 total)
o Data that has many types, each with small number of instances.
o Designs may be very large.
o Design is not static but evolves through time.
o Updates are far‐reaching.
o Cooperative engineering.
What problems can arise from long duration transactions in standard transaction models? (4 total)
More susceptible to failure
Requiring a minimisation in the amount of work lost.
May access large number of data items
Concurrency limited if data inaccessible for long periods.
Deadlock more likely.
Cooperative use of shared data items restricted by traditional protocols.
List some examples of advanced transaction models. (5 total)
o Nested Transaction Model o Sagas o Multi‐level Transaction Model o Dynamic Restructuring o Workflow Models
Describe the action of a nested transaction model (9 total)
- Transaction viewed as hierarchy of sub-transactions.
- Top‐level transaction can have number of child transactions.
- Each child can also have nested transactions.
- Transactions have to commit from bottom upwards.
- Transaction abort at a level doesn’t have to affect transaction in progress at higher level.
- Parent allows to perform its own recovery
- Updates of committed sub-transactions at intermediate levels are visible only within scope of their immediate parents.
- Commit of sub-transaction is conditionally subject to commit or abort of its superiors.
- Top‐level transactions conform to traditional ACID properties of flat transaction.
In a nested transaction model, how might a parent transaction perform its own recovery? (4 total)
o Retry sub-transaction.
o Ignore failure, in which case sub-transaction non‐vital.
o Run contingency sub-transaction.
o Abort.
What does Moss’s proposal state, and what model is it related to?
In the nested transaction model, the proposal states that only leaf‐level sub-transactions perform database operations.
What advantages are gained through the use of a nested transaction model? (4 total)
Modularity
o Transaction can be decomposed into number of sub-transactions for purposes of concurrency and recovery
Finer level granularity for concurrency control and recovery
Intra-transaction parallelism
Intra-transaction recovery control
How can nested transactions be emulated?
Using savepoints
What is a savepoint in the context of advanced transaction models?
An identifiable point in flat transaction representing some partially consistent state
Describe the action of savepoints in nested transaction models? (2 total)
- During execution of transaction, user can establish savepoint
- User can use this to roll transaction back
How do savepoints and nested transaction models differ?
Unlike nested transactions, savepoints don’t support intra‐transaction parallelism.
What is the definition of Relational Algebra?
RA operations work on one or more relations to define another without changing the original relations
In RA, what is meant by the closure property?
Expressions are nested, just as in arithmetic
In RA, what does the Selection operation do?
Works on a single relation R to define a relation that contains only tuples of R that satisfy the specified condition
σpredicate (R)
In RA, what does the Projection operation do?
Works on a single relation R to define a relation that contains a vertical subset of R.
Extracts values of specified attributes and eliminating duplicates
Πcol1,…,col n (R)
In RA, what does the Union operation do? What compatibility constraint must be maintained?
The union of two relations R and S defines a relation containing all the tuples of R, S, or both R and S with duplicates removed
R and S must be union compatible
R∪S
In RA, what does the Set Difference operation do? What compatibility constraint must be maintained?
Defines a relation consisting of the tuples that are in R but not in S
R and S must be union compatible
R-S
In RA, what does the Intersection operation do? What compatibility constraint must be maintained?
Defines a relation consisting of the set of all tuples that are in both R and S
R and S must be union-compatible
R∩S
In RA, what does the Cartesian Product operation do?
Defines a relation that is the concatenation of every tuple in relation R with every tuple of relation S
R×S
What two RA operations can be combined to emulate a join?
Selection with Cartesian Product
Which form of operation is the most challenging to implement efficiently? What effect does this have on RDBMSs?
Join operations
Causes RDBMS to have intrinsic performance issues
In RA, what does the Theta Join operation do?
Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S
R ⋈F S
In RA, what does the Natural Join operation do?
An Equijoin of the two relations R and S over all common attributes x
One occurrence of each common attribute is eliminated from the result
R⋈S
In RA, what does the Natural Join operation do? What two forms of Natural Join exist and explain there actions?
Used to display rows in the result that do not have matching values in the join column
Left join is a join in which tuples from R that do not have matching values in common columns of S are also included in result relation
R⋊S
Right join is a join in which tuples from S that do not have matching values in common columns of R are also included in result relation
R⋉S
In RA, what does the Semi Join operation do?
Defines a relation that contains the tuples of R that participate in the join of R with S
R ⊳F S
In RA, what does the Division operation do?
Defines a relation over the attributes C that consists of set of tuples from R that match combination of every tuple in S
R÷S
In RA, what do Aggregate Operations do?
Applies aggregate function, AL, to R to define a relation over the aggregate list
SQUIGGLEY-F subscript AL (R) where AL contains one or more pairs
In RA, what do Grouping Operations do? (3 total)
- Groups tuples of R by grouping attributes, GA
- Then applies aggregate function list, AL, to define a new relation
- Resulting relation contains the grouping attributes, GA, along with results of each of the aggregate functions
[subscript GA SQUIGGLEY-F subscript AL] (R) where AL contains one or more pairs
What are the two main techniques for query optimisation?
o Heuristic rules that order operations in a query
o Comparing different strategies based on relative costs, and selecting one that minimises resource usage
What is the dominant cost in query processing for centralised DBMSs?
Disk accesses
Define Query Processing
The activities involved in retrieving data from the DB
What are the two main aims of Query Processing?
Transform query written in high-level language into correct and efficient execution strategy expressed in low-level language
Execute strategy to retrieve required data
What is query optimisation?
The activity of choosing an efficient execution strategy for processing a query
Why is Query Optimisation necessary?
As there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimises resource usage
What two factors are reduced through Query Optimisation?
Reduce total execution time of a query
Reduce response time of query
What are the four phases of Query Optimisation?
o Decomposition, consisting of parsing and validation
o Optimisation
o Code generation
o Execution
How/when can the first three phases of Query Optimisation be carried out? (2 total)
o Dynamically every time query is run
o Statically when query is first submitted
List an advantage and disadvantage of Dynamic query Optimisation.
Advantages arise from fact that information is up to date
Disadvantages is performance of query affected, time may limit finding optimum strategy
Name an advantage and disadvantage of Static query Optimisation.
Advantages are removal of runtime overhead, and more time to find optimum strategy
Disadvantages arise form fact that chosen execution strategy may no longer be optimal when query is run
How can the disadvantages of Dynamic and Static Query Optimisation be combatted?
By adopting a hybrid approach
Define the process of Query Decomposition
Transformation of a high-level query into RA query and checks syntactic and semantic correctness
What are the five stages of Query Decomposition?
Analysis
Normalisation
Semantic Analysis
Simplification
Query Restructuring
Describe the Analysis phase of Query Decomposition
- Analyse query lexically and syntactically using compiler techniques
- Verify relations and attributes exist
- Verify operations are appropriate for object type
- Query then transformed into some internal representation more suitable for processing
- Query tree chosen
Describe the workflow for constructing a Query Tree (3 total)
- Leaf node created for each base relation
- Non-leaf node created for each intermediate relation produced by RA operation
- Root of tree represents query result
-> Sequence is directed from leaves to root
What happens during the Normalisation phase of Query Decomposition?
Query converted into a normalised form for easier manipulation
What two forms can the predicate be converted into during the Normalisation phase of Query Decompositon?
o Conjunctive Normal form
o Disjunctive Normal Form
What happens during the Semantic Analysis phase of Query Decomposition?
Following normalisation, normalised queries that are incorrectly formulated or contradictory are rejected
What defines an incorrectly formulated query?
Its components do not contribute to generation of result
What defines a contradictory query?
Its predicate cannot be satisfied by any tuple
How can we test for incorrectly formulated queries?
workflow with 3 stages and a summary
Through the construction of a Relation Connection Graph
- Create node for each relation and node for result
- Create edges between two nodes that represent a join
- Add edges between nodes that represent projection
If not connected, query is incorrectly formulated
What is the purpose of the Simplification stage of Query Decomposition? (3 total)
- Detects redundant qualifications
- Eliminates common sub-expressions
- Transforms query to semantically equivalent but more easily/efficiently computed form
What considerations and assumptions are made during Query Decomposition?
Typically access restrictions, view definitions and integrity constraints are considered
Assumption that user has appropriate access privileges
What are the rules of Heuristic Processing strategies? (5 total)
Perform selection operations as early as possible:
o Keep predicates on same relation together
Combine Cartesian Product with subsequent selection whose predicate represents join condition into a join operation
Use Associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive selection operations are executed first
Perform projection as early as possible:
o Keep projection attributes on same relation together
Compute common expressions once:
o If common expression appears more than once and result not too large, store result and reuse it when require
o Useful when querying views, as same expression is used to construct view each time
What problems are potentially faced by a DBMS when storing statistics? (2 total)
If stats updated every time tuple is changed, performance would be impacted
DBMS may update stats on a periodic basis or whenever system idle
READ CHAPTERS 5 THROUGH 7
UNDERSTAND MATHEMATICS AND RA