Big Data Lecture 06 Wide Column Stores Flashcards
What are the issues with RDBMS (relation database management systems) and how to solve them?
<ul><li>We hit the limit of storage of one machine --> scale up!</li><li>It is too slow --> cluster, replicate: scale out! (Difficult, very high maintainance costs.)</li><li>Solution: HBase - distributed database system!</li></ul>
How do wide-column stores compare to other methods? What is the only downside?
It marks all the columns green in this!<br></br><img></img><br></br>Small size per item, at around 10MB for optimal performance.<br></br><br></br>*Random access means being able to access any data as we wish without reading everything sequentially.
Who was the pioneer of Wide-Column Stores?
Google with its ‘Big Table’.
What is the motivation for making huge tables? How does it improve on RBDMS? Why don’t we replace RBDMS with it?
What is related stays together (stuff is denormalized), this is because the join operations in query time are very expensive.<br></br><br></br>RDBMS is good for updating stuff, which we cannot do with WCS.
How is each row identified?
It has a unique row ID, by which the records are sorted.
What are column families?
Residual of tables that were joined in the database, they are somehow realated, and they are stored together.<br></br><br></br>They must be pre-specified, but we can later add columns within the family.
What are the possible types of values in WCS?
The values are not typed, they are just byte objects. Some utility functions for reading the data are implemented.
What is the optimal size of an object in WCS?
<= 10 MB per cell, but it can be anything (text, image, webpage, JSON…)
What queries can be executed?
<ul><li>get (per rowID),</li><li>put (inserting per rowID, can also overwrite, can be partial, e.g. per column family),</li><li>scan (linearly scan certain rows and columns),</li><li>delete (per rowID).</li></ul>
What do we refer to as key-value model in general?
Any sort of hash map setting where the key points to the value (e.g. using hashtable).<br></br>
What is column oriented storage?
Columns of a database are physically stored on the disc together.
What are example of Wide Column Stores databases?
Google’s Big Table,<br></br>Apache HBase,<br></br>Cassandra.
How are tables stored in WCS?
In regions: cut of both rows (min inclusive and max exclusive) and columns.
What manages files underneath the WCS?
HDFS!
What is the architecture of HBase?
We have HMaster node (does DDL operations), which rules over processes on RegionServers (does DML operations), these all can be running on the same machine. Each RegionServer is an HDFS client.<br></br><br></br><img></img>
What is DDL and DML?
Data Definition Language (e.g. create table with specified columns),<br></br>Data Manipulation Language (e.g. add this row to this column).