DDI Ch 3 - Storage & Retrieval Flashcards
There is a big difference between storage engines that are
optimized for transactional workloads and optimized for analytics
List the two types of storage engines we are studying first
log-structured storage engines and page-oriented storages engines (such as B-trees)
for writes, it’s hard to beat the performance of what?
appending to a file, bc that’s the simplest possible write operation
our first tradeoff in storage systems
well-chosen indexes speed up read queries, but every index slows down writes.
what type of index is a useful building block for more complex indexes?
indexes for key-value data.
Bitcask
the default storage engine in Riak
Bitcask offers what?
high performance read and writes (I think particularly for key value data), subject to the requirement that all the keys fit in the available RAM (since the hashmap is kept completely in memory)
how long does one disk seek take?
4 - 15 ms [https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#:~:text=The%20fastest%20high%2Dend%20server,to%2Dtrack%20and%20full%20stroke.]
OLTP stands for
Online Transaction Processing
OLAP stands for
Online Analytics Processing
Disk bandwidth
the total number of bytes transferred divided by
the total time between the first request for service and the completion of the last transfer.
The bottleneck for OLTP vs OLAP
OLTP = disk seek time OLAP = disk bandwidth
Size of OLTP systems vs OLAP systems
OLTP = GB to TB OLAP = TB to PB
Apache lucene
Apache Lucene is a free and open-source search engine software library, originally written completely in Java by Doug Cutting
A storage engine like Bitcask is well suited for when?
For situations where the value for each key is updated frequently (e.g the key might be the url of a cat video, and the value might be the number of times it has been played (incremented every time someone hits the play button))