HBase Architecture Flashcards
What are the four basic events that can potentially destroy data locality?
1 - The HBase balancer decides to move a region to balance data sizes across RegionServers.
2 - A RegionServer dies. All its regions need to be relocated to another server.
3 - A table is disable and re-enabled.
4 - A cluster is stopped and restarted.
What metric measures HFile data locality?
HFile locality index = ( Total number of HDFS blocks that can be retrieved locally by the region server ) / ( Total number of HDFS blocks for all HFiles )
What are the hierarchy of objects on a Region Server?
Table (HBase table)
Region (Regions for the table)
Store (Store per ColumnFamily for each Region for the table)
MemStore (MemStore for each Store for each Region for the table)
StoreFile (StoreFiles for each Store for each Region for the table)
Block (Blocks within a StoreFile within a Store for each Region for the table)
How are regions assigned to RegionServers when HBase starts?
1 - The Master invokes the AssignmentManager upon startup.
2 - The AssignmentManager looks at the existing region assignments in META.
3 - If the region assignment is still valid (i.e., if the RegionServer is still online) then the assignment is kept.
4 - If the assignment is invalid, then the LoadBalancerFactory is invoked to assign the region. The DefaultLoadBalancer will randomly assign the region to a RegionServer.
5 - META is updated with the RegionServer assignment (if needed) and the RegionServer start codes (start time of the RegionServer process) upon region opening by the RegionServer.
How are regions assigned to RegionServers when a region fails?
1 - The regions immediately become unavailable because the RegionServer is down.
2 - The Master will detect that the RegionServer has failed.
3 - The region assignments will be considered invalid and will be re-assigned just like the startup sequence.
What is a Store?
A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
What is the MemStore?
The MemStore holds in-memory modifications to the Store. Modifications are KeyValues. When asked to flush, current memstore is moved to snapshot and is cleared. HBase continues to serve edits out of new memstore and backing snapshot until flusher reports in that the flush succeeded. At this point the snapshot is let go.
What is a StoreFile (HFile)?
StoreFiles are where your data lives.
What are Blocks?
StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis. Compression happens at the block level within StoreFiles.
What is contained in ROOT?
-ROOT- keeps track of where the .META. table is.
Key format: .META. region key (.META.,,1)
Contains:
info: regioninfo (serialized HRegionInfo instance of .META.)
info: server (server:port of the RegionServer holding .META.)
info: serverstartcode (start-time of the RegionServer process holding .META.)
What is contained in META?
The .META. table keeps a list of all regions in the system.
Key format: Region key of the format ([table],[region start key],[region id])
What is Block Cache?
The Block Cache is an LRU (least recently used) cache that contains three levels of block priority to allow for scan-resistance and in-memory ColumnFamilies:
- Single access priority
- Mutli access priority
- In-memory access priority
Besides your data what else is stored in the Block Cache?
- Catalog tables: The -ROOT- and .META
- HFiles indexes
- Keys
- Bloom Filters
What order are an HBase table contents sorted in?
row key, column family, column qualifier and timestamp
Does disabling block caching improve scan performance when you perform a full table scan of your data?
Yes. When you disable block caching, you free up memory for other operations. With a full table scan, you cannot take advantage of block caching anyway because your entire table won’t fit into the cache.