Data Intensive Ch7 - Transactions Flashcards
Why we need transactions - what can go wrong
- Database software/hardware can fail in the middle of write
- App can crash in the middle of series of operations (f.ex. due to out of memory)
- network interruption can cut off app from db
- several clients can write to db at the same time and overwrite each other writes
- Client may read some partial, unfinished update
- Race conditions between concurrent client writes can cause bugz
Transaction
Mechanism to simplify handling “what can go wrong” scenarios
Way for an application to group several reads & writes into one logical unit - all operations are executed as one operation which either
- succeeds (commit)
- fails (abort, rollback)
If failed - app can safely retry
App need not to worry about partial failure
“Mechanizm, ktory pozwala nie martwic sie np. w sytuacji jesli musimy wykonac serie operacji np. odjac kwote z konta A i dodac do konta B tym, ze w srodku wydarzy sie katastrofa”
“Simplify the programming model for apps accessing db”
ACID
Set of safety guarantees provided by transactions
Now marketing term because one implementation might not equal another (isolation is often amigious)
Atomicity
Atomic - cannot be broken down into smaller parts
NOT about concurrency (other client sees half finished transaction)!
It’s about “All or nothing” - ability to abort transaction on error and having all writes discarded
Abortability more than atomicity
Consistency
App-specific notion of DB being in “good-state”
Idea - data has certain statements (INVARIANTS) that must be always true (credits + debits are always balance)
It’s app responsibility to define constraints, DB only enforces them and stores data
It’s property of the application actually
Joe Hellerstein remark - C in ACID was tossed in to make the acronym work
Isolation
Concurrency issues arise if several clients access the same record at the same time
Concurrently executing transactions cannot “step on each other’s toes”
Serializability - each transaction can pretend it’s the only one running on the entire DB; DB ensures on commit the final result is the same as if the actually run one after another even if it’s not true
In practice - it’s expensive so weaker isolation levels are used
Durability
226
Promise the committed transaction will not lose any data even if failure occurs
Usually involves using write-ahead log (in case data structures on disk are corrupted)
In replicated db usually means that value was replicated to N nodes
Perfect durability is not possible - what if all disks and backups are destroyed at the same time?
Dirty read
229
Violation of isolation
One trx reads another’s uncommitted writes
- Trx updates multiple objects - dirty read means trx may see only some of them updated (user sees unread emails but not the updaated counter)
- Trx aborts - any writes are rolled back - dirty read - other trx can see data that’s never actually committed
Example: listing unread emails and storing counter of unread emails separately
Why retrying aborted transaction is not perfect error handling mechanism
- Trx succeeds but network fails to ACK - client would think it failed and retry causes trx to be performed twice (requires app level de-duplication, unique ids for trxs)
- if error was due to overload retrying will only add fuel to the fire (use exponential backoff and limit retries)
- some retries are pointless (constraint violation)
- if trx has side effects outside of db they can happen even if failed trx was aborted (sending emails)
- client process fails during retrying - data it was trying to write is lost anyway…
Why concurrency bug are hard to find by testing (so you better use appropriate isolation levels)
They are rare - you need to be unlucky with the timing. Hard to repro.
It’s hard to reason about concurrency
Weak isolation levels
Non-serializable isolation level that can be used in practice when serializable is too costly for performance
Read committed
The most basic trx lvl
No dirty reads (see only committed data)
No dirty writes (can only overwrite committed data)
Default setting in Postgres and many
Implementation of read committed:
Against dirty writes - most commonly - row-lvl locks held until trx is committed/aborted; other trx must wait
Most DBs use MVCC as they support snapshot-isolation anyway
Dirty reads - use the same lock, aquire and release after reading (read can’t happen when object is dirty)
Doesn’t work well - short, read-only trx can get stuck waiting for long running trx to complete;
Most DBs return old committed value until trx holding the lock commits.
No dirty writes
For concurrent updates to the same object we can assume later write overwrites the earlies. What if earlier is part of trx not committed yet though - dirty write happens. Prevention - delay later write until trx with first one is committed
- trx updating multiple objects - Alice and Bob buys a car which requires 2 db writes. Bob wins at listing, Alice at invoices. str 236 7.5 fig
- Read committed does not prevent race condition for counter increment
Actually fun fact in PG read commited reads the latest commited value
https://www.postgresql.org/files/developer/concurrency.pdf
What is read skew
Example of nonrepeatable read.
Can happen in read commited isolation lvl.
Cannot be tolerated for:
- backups
We make copy of entire DB. During the process writes are still accepted. It’s not acceptable to have some parts of db in old and some in new state. After restore the snapshot would be wrong.
- Analytical queries, integrity checks
Must see consistent database snapshot or will report invalid results
For the above: snapshot isolation is recommended
Example:
Alice has 2 accounts, 500 each, 1000 total.
Some trx transfers 100 from one to another.
Alice is unlucky to read balance of one account before transfer trx starts and the other’s after it’s done. So she sees 900 dollars total (or 1100, depends which account is read first - “from” or “to” one).
How to implement snapshot isolation lvl?
dirty writes:
Write locks to prevent
Dirty reads:
DB keeps several different commited versions of an object/row
Transactions see consistent db state at point of time snapshot was taken
Multi version concurrency control MVCC
MVCC
Multi version concurrency control
How implemented in PG:
1. Transaction is given unique, always-increasing ID
2. When trx writes to DB data is tagged with ID of writer
3. created_by internal field for each row in table
4. deleted_by internal field for each row in table for deletes - initially empty, when delete happens trx ID of requesting trx is there; garbage collection removes rows marked for deletion after row is inaccessible
5. Update is translated into delete & create
page 240 figure 7-7
Trivia: Postgres transaction IDs
Unique Always increasing 32bit integers Overflow after ~4kkk transactions PG's vacuum process cleans up to prevent overflow from affecting the data
Snapshot isolation visiblity rules for consistent snapshot
When trx starts - get list of all concurrent trx which are in progress.
Their later-committed changes will be ignored
Writes made by aborted trx are ignored.
Writes made by trx with greater trx id are ignored.
Other writes visible.
If row was deleted by not-commited yet trx or one started later - row is still visible to the running trx
If row was created by not-commited yet trx or one started later - row is not visible
Dealing with index updates with MVCC
- Index points to all version of an object
Index query filters out versions not visible to current trx
Garbage collection removes versions no longer visible rows and index entries
Optimizations exists - if different versions of an object fits the same page - write it there -> no need to update idx
- Append-only/copy-on-write B-Trees
Creates a copy of each modified page
Parent pages up to the root are copied and updated
Pages not affected are left untouched
Every trx creates new B-tree roots
Background process for compaction/GC is required
Lost updates
Concurrent write-write conflict
Read value -> modify it -> write back
Incrementing a counter
- Easy - atomic write operators
Dataset Keywords example
-> Adding an element to the list in json document (which jsonb column essentially is)
Lost updates prevention
Atomic write operators
update set x = x + 1 where …; usually works
Explicit locking
Select for update
Auto detection of lost updates
Allow trx to execute in parallel, abort trx which commit would cause lost update. Client must retry read-modify-write cycle
PG Snapshot isolation does this
Compare-and-set
Update only happens if value to compare (which update was based on) is still current value
Conflict resolution, replication
Siblings (multiple conflicting versions returned to client)
Last write wins
Auto-merging (increment counters, Riak)
Write skew
Example:
2 doctors on call.
Both take a leave at the same time.
Consistency check min 1 doc on call won’t work as it’s based on snapshot
Other examples:
- booking a room (select bookings for the room in given period of time and update booking if one does not exist)
- Chaning username (unique username taken by 2 users concurrently)
Not a lost update or dirty write - modifies 2 separate rows
Trxs reads some objects and modifies some of them (different trx different object)
Usually there’re no multi-object constraints (can be simulated trigger or materialized view)
Serializable isolation level is one remedy
Explicit lock on rows involved is the second best
Generalization pattern:
- select made to check some condition on multiple matching rows
- app code business logic based on the result
- insert/update/delete if app goes ahead
Phantom row skew
In write skew problem when select returns no results and in the meantime a new row that would be returned is added.
What’s phantom:
One trx changes the result of a serach query of another.
Snapshot isolation prevents phantoms for read-only queries, with read-write we have a write-skew potential
Issues with isolation levels
Hard to understand and INCONSISTENTLY IMPLEMENTED in different DBs (reapeatable read is ambigious)
In large apps it’s difficult to tell if running given trx at given isolation level is safe (other business processes happen concurrently and might screw it over)
Static analysis may help but it’s meh
Testing for concurrency - hard and nondeterministic
Serializable isolation is the only foolproof option
Serializable isolation level
Strongest lvl
Trxs running in parallel are guaranteed to have the same end result as if they were running serially, one after another
Performance-wise costly implementation - otherwise there would be no need to use weakly IL ;)
Implementation techniques:
- literal serial execution
- two phase locking
- optimistic concurrency control techniques (serializable snapshot isolation)
Serializable - literal serial execution implementation
Single-threaded execution loop
Wasn’t possible before due to memory limitations
OLTP trx are usually short, OLAP are long but should run on consistent snapshot (repeatable reads!)
No interactive multi-statement transactions -> too much time spent on network communication and waiting -> throughput would be terrible
App sends entire trx code as stored procedure
Stored procedures are meh though
- each DB - different syntax
- hard to debug code running in a db, harder to monitor, test
- db is more perf-sensitive than web server -> malicious stored procedure could impact other app instances
+ some dbs use langs like java
Other idea - paritioning
One single-threaded worker per partition assuming stored procedure doesn’t need cross-partition data -> this is possible but extra overhead
Summary:
- trx must be small and fast or will stall all trx processing
- dataset must fit in memory
- throughput low enough to be handled on single CPU or we need partitioning
- cross-partitions are limited
Serializable - 2 phase locking implementation
Writers block writers and readers
Readers also block writers
(Snapshot isolation assumes readers never block writers, writers never block readers)
Lock for each object in db
Shared or exclusive mode
When trx reads - acquire the lock in shared mode
- when there’s trx with exclusive lock - shared lock must wait
When trx writes - acquire the lock in exclusive mode
- waits until any existing (either modes) locks are released
When trx reads and then writes it can upgrade from shared to exclusive (same rules as acquiring exclusive lock)
After acquire - lock is held until end of transatction
- hence 2 phase - acquire and release
2 phase locking - why it’s not a silver bullet?
Many cons like:
- deadlocks can happen
- Performance issues - acquiring lock
- unstable latencies - no one knows how much time trx will have to wait for another
- performance issues - limited concurrency
- hence limited throughput and latency
What’s a predicate lock
Similar to shared/exclusive lock
Is not acquire per object but set of objects matching a criteria (predicate)
Think of bookings within given time period
Applies to objects that do not yet exist in db - phantoms
Cons:
low performance if each trx must check conditions for any matching locks
Index-range locks
Simplified predicate locks matching greater set of objects
Instead of matching a bookable room in given time period - lock the whole room fo any time and date or lock all rooms in given time perdiod
Indices on room id, start and end time of booking
Lower overhead than predicate lock but they lock bigger range of objects than needed
Serializable Snapshot Isolation
Optimistic concurrency control technique
Instead of blocking something potentially dangerous - trx continues hoping that all will turn all right.
When trx commits db checks for any potential isolation violations - if so ABORT
Performs poorly when there’s high contention (lots of aborted trx) - lots of waste throughput if system is already overloaded is never good
Low contention - performs better than pessimistic methods
Based on snapshot isolation with added algorithm for detecting serialization conflicts among writes and determing which trx needs to be aborted
Decisions based on an outdated premise - as with doctors on call
More predictable latency
No waiting for locks
Not limited to throughput of single CPU
Consider rate of aborts
Pessimistic concurrency mechanism
Based on the principle that if anything can go wrong it’s better to wait until it’s safe again
Indicator being a lock held on some object trx is trying to access
Serial execution - pessimistic to the extreme, total mutual exclusion
Decisions based on an outdated premise - detecting query result change
Detecting stale MVCC reads
tracking when trx ignores another’s writes due to MVCC visiblity rules
If any of ignored writes were commited at the time trx is commited - abort
Do not abort on read as db does not know
- if trx is going to do any writes
- if other trx that made a write will not abort
Detecting writes that affect prior reads
Trx modifies data after it’s been read
If reading trx commits then writing trx must be aborted