Data Engineering & Databases Flashcards
What does a Database do?
Primarily 2 things: stores and retrieves data
What are the 2 main philosophies for primary data storage
- Log-structured
2. Update in place
Log
An append only sequence of records
Index
Additional metadata to help locate specific data
SSTable
Sorted String Table
- A file in which data is sorted by key
LSM-tree
Log-structured Merge Tree
Fundamental idea
- keep a cascade of SSTables that are merged in the background
- turns random writes into sequential writes
What is the DB write path when using SSTable / LSM-tree
- Write to a memtable - in-memory balanced tree
- Also write to unsorted log for crash recovery
- When the memtable gets big enough (several MB) disk as SSTable file
- While SSTable is written, writes continue on new memtable
- Background merge & compaction to combine SSTable files
What is the DB read path for a DB implemented with SSTables / LSM-trees
- Look for key in memtable
- Look in most recent disk segment
- Look in next older disk segment
- (Sped up by using Bloom filters to determine if it doesn’t exists, which are the most expensive reads)
LSM-tree Compaction & Merge Strategies
Size-tiered - newer, smaller SSTables merged into older, larger SSTables
(HBase, supported by Cassandra, Scylla)
Levels - key range split into smaller SSTables, older data is moved into separate levels
(Used by LevelDB, RocksDB, supported by Cassandra, Scylla)
B-Tree
Standard index implementation in almost all relational DBs (and many non-relational)
- Break DB into fixed-size pages (trad 4KB), and read/write one page at a time
- Each page contains keys and references to child pages (using on-disk “pointers”)
- 1st page is the root (search always begins here)
- Each child has a continuous range of keys, keys between the references indicate boundaries of those ranges
- Leaf pages contain individual keys and values inline (or in references pages)
Branching factor
The number of references to child pages in one B-tree page
- Typical factor is several hundred
- Very large DBs can be stored with few levels (256 TB stored in 4 levels of 4KB w branching factor of 500)
WAL
Write-ahead Log
- Append-only file where every B -tree modification is written before its written to the tree (in case of failure during hardware write, which could cause an orphan page with no parent)
Writes/Reads for LSM-tree vs B-tree
LSM-tree typically faster for writes
B-tree typically fasters for reads