Transaction Management/transactSQL Flashcards
What are transactions and what causes complications to arise?
A series of queries. The complication arises from that many might happen at the same time (concurrency) and that computers can fail (partial execution)
What can concurrency lead to?
- Inconsistent data in the database
- (for example two people being able to book the same seat in a cinema or the following person overriding the first person’s booking)
What are transactions?
- They state that a group of SQL statements must be executed together so that no conflicts arise
- Transactions mean that specific events have to happen before another event
- meaning that we can’t for example have two people book the same seat as we’ll have a transaction that means once the seat has been picked, another person can’t pick it.
how do we ensure serialisable behaviour?
- By telling a DBMS that a sequence of SQL statements forms a transaction.
- It then knows that the transaction is to be executed as if in isolation from all other transactions
What is partial execution?
- Partial execution is when only part of a transaction gets carried out
- This could happen due to a failure or a power outage
How do DBMSs deal with partial execution?
- Transactions make sure that the whole set of instructions in a transaction is carried out otherwise the actions done are dropped/not permanently implemented. Either none or all of the commands are done.
How do we state in SQL that a transaction should be executed as a whole or not at all?
Using the keywords ‘START TRANSACTION’ and ‘COMMIT’
What do we do if we have to abort a transaction due to partial execution due to a failure?
Use the keyword ROLLBACK - if something fails and you don’t want any of the transaction to happen.
Why can we ignore operations such a calculations when looking at transactions?
- Because the OS does those operations
What does read(X) do?
- reads a database item X into a program variable (also named x for simplicity)
- finds address of disk block that contains item x
- copies that disk block into a buffer in main memory
- copy item x from buffer to program variable
What does write(x)
- finds address of disk block that contains item x
- copies that disk block into a buffer in main memory
- the program copies the new value into the correct location of item x in the buffer
- either immediately or some time after it writes the item in the bugger to the correct disk block
What is a schedule?
- a bunch of transactions in a specific order
What are the two main types of schedules?
- Serial schedules: executes transactions one after another
- Concurrent schedules: interleaves operations from different transactions, while still preserving that the operation in each transaction happen in the right order
What do each of the symbols represent?
Sn = id of the schedule
ri(x) = read(x) in transaction i
wi(x) = write(x) in transaction i
ci = commit in transaction i
ai = abort (“rollback”) in transaction i
Does order matter in serial schedules?
Yes because the data stored in shared variables between transactions can be different depending on transaction order
what is true about serial and concurrent schedules?
- All serial schedules are concurrent schedules
- But not all concurrent schedules are serial schedules
When is something not a concurrent schedule?
- When the order of the SQL statements inside the same transaction are executed out of order
Why can it be a problem that two transactions are not isolated from each other?
How can we solve this?
- issue of concurrent access
- if two transaction are accessing the same item then the outcome can be inconsistent with the real world.
- can solve this by undoing the second transaction
What does it mean for a transaction should be atomic?
the transaction is indivisible
Why can it be a problem if there’s a failure in the middle of a transaction?
How can we solve this?
- issue of partial execution
- leads to inconsistencies in the real world
- can solve by just undoing the transaction when the computer comes back online
What is durability?
That any later changes did not disappear after having finished based on something someone else did in the database
- so one transaction did not incorrectly overwrite an item that another one wrote to previously
What do we use to make ensure that we’re never in the situation of having errors made due to a failure (partial execution) and concurrent access?
ACID
What are the properties of ACID?
- A: Atomicity C: Consistency I: Isolation D: Durability
What is atomicity?
A transaction is an atomic unit of processing
- an indivisible unit of execution
- executed in it’s entirety or not at all
Deals with failure by aborting a transaction
- we undo the word done up to the error point
- system recreates state of database before start of aborted transaction
Commit - no errors, entire transaction is executed, system is updated appropriately
What is consistency?
- correct execution of a transaction takes the database from one consistent state to another
- when transactions don’t violate any constraints
- database accurately reflects state of real world
- if we have to abort then the database is not in a consistent state and we have to recreate the consistent state (which is the state of the database before the aborted transaction started)
Why are serial schedules consistent?
- A serializable schedule always leaves the database in a consistent state. A serial schedule is always a serializable schedule because, in a serial Schedule, a transaction only starts when the other transaction has finished execution
What are serialisable schedules?
schedules that are equivalent to serial schedules
(though there are multiple definitions of serialisability)
What is isolation?
- a particular transaction should see itself as the only transaction in the database
- the levels of isolation refer to how isolated a transaction should be from other transactions operations (such as modifying items used in both)
What are the levels of isolation
- Read uncommitted
- read committed
- Repeatable read
- serialisable
What is the read uncommitted isolation level?
- No isolation at all, you can read data which has not been committed
What is the read committed isolation level?
- every item you read must have been committed before you can see it
What is the repeatable read isolation level?
- every item you read must have been committed before you can see it
- if you read the same thing twice in a transaction, you must get the same return value
What is the serialisable isolation level?
- Has all the levels above
What is a serialisable schedule
- A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial schedule
- You have serial schedule TA then TB, and item X = 15 after both transactions have run
- if you have some of TA then TB then TA then TB and item X still = 15, this schedule is not serial but it serialisable
What is durability?
- Once a transaction commits and changes the database, then these changes cannot be lost because of failure
- the effect of a transaction on the database should not be lost after the commit point
- we REDO the transaction if there are any problems after the update
- durability means we can deal with media failure
What are some relational DBMS Components
- User/application
- transaction manager
- logging and recovery
- concurrency control
- Buffers
- Storage
What is transaction management formed of?
- Concurrency control
- Logging and recovery
Which components of transaction management satisfy ACID?
A - via recovery control (logging and recovery)
C - via scheduler - concurrency control
I - via scheduler - concurrency control
D - via recovery control (logging and recovery)
Advantages of serial schedules?
- maintains consistency
- maintains isolation
Advantages of concurrent schedules (non serial)?
- more efficient in multi-user environments (transactions don’t have to wait for others to finish before starting)
What do concurrent schedules not guarantee?
Consistency or isolation
How come concurrent schedules don’t guarantee consistency or isolation?
- Because they can accidentally overwrite values in the buffer
How much concurrency can we allow while satisfying Isolation and Consistency?
A schedule S is serializable if there is a serial schedule S’ that has the same effect as S on every initial database state.
What do seralisable schedules guarantee?
- consistency and correctness
Why is serialisability hard to test?
- because it depends on reads, writes, commits and non-database operations
- non-database operations can be complex
How much concurrency can we allow while satisfying Isolation and Consistency … while being able to check it fast?
by having conflict seralisability
What is a conflict in a schedule?
- a pair of operations from different transactions that cannot be swapped without changing the behavior of at least one transaction
What can be defined as a conflict in a schedule?
A pair of operations from different transactions that access the same item and at least one of them is a write operation
How can we tell if a schedule is conflict serialisable?
- If it is conflict equivalent to a serial schedule
How can we tell if a schedule is conflict-equivalent?
- two schedules S and S’ are conflict-equivalent if S can be obtained from S’ by swapping any number of (1) consecutive (2) non conflicting operations from (3) different transactions
Describe which schedules fit into other schedules
- All serial, conflict-serialised and serialised schedules are concurrent schedules
- All serial and conflict-serialised schedules are serialisable
- all serial schedules are conflict serialisable
How do we quickly show that something is not conflict-serialisable
- we create a precedence graph with each transaction as a node and each conflict as a link between nodes
- if there is a cycle within this graph then the schedule is not conflict-serialisable
How do we quickly show that something is not conflict-serialisable
- we create a precedence graph with each transaction as a node and each conflict is a link between nodes (only if op1 appears before op2) and we set the nodes out in order T1,T2,T3
- if there is a cycle within this graph then the schedule is not conflict-serialisable
why are cycles in the precedence graph bad?
- They means that a contradiction has arisen (which could cause a deadlock)
What are the steps of finding a serial schedule from a precedence graph?
- Find a transaction with only outgoing edges
- you put that first in the schedule and remove the transaction from the graph
- you repeat this process until there’s no nodes left.
How does transaction scheduling work in a DBMS?
- The scheduler gets fed operations and it can either execute them or delay them happening.
How can we enforce conflict serialisability?
Using locks with a simple locking mechanism
- a transaction has to lock an item before it accesses it
- locks are requested from and granted by the scheduler
- each item is locked by at most one transaction
- each lock must eventually be released
What are the symbols for locking and unlocking an item(x)
l1(X) for locking
u1(x) for unlocking
What are the rules for locking in a schedule
Every lock must be followed by an unlock
An item(x) must be unlocked by a transaction before being locked again by a different transaction
Why may not every schedule with simple locking may be serialisable?
There’s only one lock so other transactions have to wait to run whilst the first transaction has locked an item it wants to use