Desginingn data intensive book Flashcards
B tree faster in reading or writing?
reading
Lsm trees faster in read or write?
write
Reads are typically slower on LSM-trees because
because they have to check several different data structures and SSTables at different stages of compaction.
throughput
tavane amaliyati
empirically.
به صورت تجربی
B-Trees: Write Path
B-Trees write every piece of data at least twice: once to the write-ahead log (WAL) and once to the tree page itself.
LSM-Trees: Write Path
LSM-Trees rewrite data multiple times due to compaction and merging of SSTables.
Write Amplification in LSM-Trees
Write amplification in LSM-Trees means one write to the database results in multiple writes to disk over its lifetime.
Performance Bottleneck In write-heavy applications
the rate of writing to disk can be a bottleneck.
Write Amplification Impact
Write amplification affects performance by reducing writes per second within available disk bandwidth.
LSM-Trees Write Throughput in comparison with b trees
LSM-Trees generally sustain higher write throughput than B-Trees.
Sequential Writes vs random access speed
Sequential writing in LSM-Trees is faster than the random access writes required by B-Trees.
Reason for Higher Throughput
LSM-Trees have lower write amplification in some cases and write compact SSTable files sequentially.
Batching Writes
LSM-Trees accumulate many writes in memory and then flush them to disk in one go, reducing constant disk access.
Minimizing access disk
LSM-Trees organize data in larger batches, reducing the need access disk
LSM-Trees Disk Space
LSM-Trees often produce smaller files on disk due to better compression.
B-Trees Disk Space
B-Trees leave some disk space unused due to fragmentation.
reading speed and compaction in High Write Throughput
At high write throughput, compaction may not keep up with incoming writes, leading to more unmerged segments and slower reads.
B-Trees Transactional Semantics
B-Trees have an advantage as each key exists in one place in the index.
Log-Structured Storage Engines
Log-structured storage engines may have multiple copies of the same key in different segments, complicating transaction isolation and lock management.
B-Trees Popularity
B-Trees are deeply integrated into many databases.
LSM-Trees Popularity
LSM-Trees are gaining popularity in new data stores due to their write performance benefits.
Databases using B-Trees
MySQL (InnoDB), PostgreSQL, SQLite
Databases using LSM-Trees
Apache Cassandra, RocksDB, LevelDB
What is a Secondary Index?
Indexes columns other than the primary key, enabling efficient joins and searches on non-primary key fields.
What does “Storing Values within the Index” mean?
Indexes can store actual row values or references to the rows stored elsewhere (heap files).
What is a Clustered Index?
Stores the actual row data within the index, minimizing lookup steps for read-heavy workloads.
What is a Covering Index?
Includes some of the table’s columns within the index itself, satisfying some queries without accessing the table.
What are Multi-Column Indexes?
Combines several columns into one index key, enabling efficient querying on multiple columns simultaneously.
What are Multi-Dimensional Indexes used for?
Supports querying several columns at once, especially useful for geospatial data.
What are Full-Text Search and Fuzzy Indexes?
Supports searching for similar keys or handling typos and synonyms.